SJiWtDD;Ml'-93,'172 


PROCEEDINGS  OF  THE  1993  COMPLEX  SYSTEMS 
ENGINEERING  SYNTHESIS  AND  ASSESSMENT 
TECHNOLOGY  WORKSHOP  (CSESAW  '93) 

(20-22  JULY  1993) 


STEVEN  HOWELL,  COORDINATOR 


SYSTEMS  RESEARCH  AND  TECHNOLOGY  DEPARTMENT 


17  OCTOBER  1993 


Approved  for  public  release;  distribution  is  unlimited. 


93-30135 

ililiiili 


JVSl/MT 


NAVAL  SURFACE  WARFARE  CENTER 

DAHLGREN  DIVISION  •  WHITE  OAK  DETACHMENT 
Silvtr  Springs  Maryiii^ 


93  12  10939 


Best 

Available 

Copy 


NSWCDO/MP-93/172 


PROCEEDINGS  OF  THE  1993  COMPLEX  SYSTEMS 
ENGINEERING  SYNTHESIS  AND  ASSESSMENT 
TECHNOLOGY  WORKSHOP  (CSESAW  '93) 
(20-22  JULY  1993) 


STEVEN  HOWELL.  COORDINATOR 
SYSTEMS  RESEARCH  AND  TECHNOLOGY  DEPARTMENT 


17  OCTOBER  1993 


Approved  for  puMk  release;  distribution  is  unlimited. 


NAVAL  SURFACE  WARFARE  CENTER 
DAHLGREN  DIVISION  •WHITE  OAK  DETACHMENT 
Silver  Spring.  Maryland  20903*5640 


1993  COMPLEX  SYSTEMS  EM6IMEER1NG  SYNTHESIS  AMD 
ASSESSMENT  TECHNOLOGY  WORKSHOP  (CSESAW  *93) 


As  technology  has  developed,  computer- intensive  systems  have 
increasingly  become  extremely  large  and  complex,  controlling  a 
wide  variety  of  resources  and  operating  in  many  unforeseeable 
situations.  Many  of  today's  systems  have  hard  real-time, 
stringent  dependability,  intensive  security,  and  demanding  cost 
of  ownership  requirements.  They  are  typically  implemented  on  a 
combination  of  parallel  and  distributed  architectures  and  are 
embedded  within  a  human  organizational  structure  and/or  have 
human  operators  in  the  loop. 

This  is  the  third  year  of  the  CSESAW  (pronounced  see-saw) 
workshop.  This  year  the  workshop  is  co-sponsored  by  Office  of 
Naval  Research,  Naval  Surface  Warfare  Center  Dahlgren  Division, 
and  Advanced  Technology  and  Research  (ATR) .  The  workshop  was 
created  to  explore  system  level  design  synthesis  and  assessment 
capabilities  for  large,  complex  systems.  These  capabilities  will 
facilitate  the  development  of  such  systems  from  informal  system 
requirements,  through  the  design  phase  prototyping,  and  into 
implementation  and  post  deployment.  Component  products  produced 
by  these  capabilities  are  specifications  that  subenvironments, 
e.g..  Hardware  Engineering  Environment  (HWEE) ,  Software 
Engineering  Environment  (SEE)  and  Human  Computer  Interaction 
Engineering  Environment  (HCIEE),  will  receive.  The  focus  of  this 
workshop  is  the  development  and  integration  of  these  multiple 
technologies  and  the  exploration  of  the  creation  of  a  system 
level  engineering  discipline  with  support  technologies  to  provide 
potential  high  payoff  solutions  to  the  difficult  problems 
encountered  by  designers,  developers,  and  maintainors  of  real¬ 
time  systems.  The  emphasis  is  on  resolving  system  level 
technology  issues  that  cut  across  component  boundaries,  such  as 
those  associated  with  system  behavior  requirements  of  real-time, 
fault  tolerance,  cost,  and  security. 

The  emphasis  on  this  year’s  workshop  is  integration.  The 
technologies  and  capabilities  need  to  be  integrated  with  the  rest 
of  the  engineering  process.  Therefore,  the  capability  to  provide 
tight  linkages  to  detailed  design  evaluation,  systems  forward 
engineering  and  systems  reengineering  must  be  developed, 
ultimately  providing  a  seamless  overall  engineering  process.  A 
significant  amount  of  effort  has  been  put  into  component 
technologies,  such  as  hardware,  microelectronics,  memory, 
databases,  software,  man-machine  interface,  etc.  Major  strides 
have  been  made  in  these  areas  in  the  last  few  years.  However, 
the  formal,  systematic  integration  and  engineering  of  these 
components  into  an  overall  system  has  lagged  far  behind.  For 
large  and  complex  systems  with  real-time,  cost,  dependability  and 
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security  requirements,  the  problem  is  especially  acute.  This  is 
a  direct  result  of  a  lack  of  a  system  level  engineering 
methodology. 

We  welcome  you  to  this  year's  workshop.  We  hope  to  continue 
to  provide  in  the  workshop  an  atmosphere  in  which  the 
participants,  including  technology  developers,  researchers,  users 
and  customers  can  meet,  interact  and  exchange  ideas  on  relevant 
issues.  In  the  near  future  we  hope  to  be  able  to  say  that  this 
workshop  was  the  beginning  of  a  new  focus  on  systems  design  and 
evaluation  technologies . 

This  workshop  would  not  have  been  possible  without  the  hard 
work  of  many  people,  including  the  workshop,  program,  and 
advisory  committees,  authors,  presenters  of  the  submitted  papers, 
panel  members,  workshop  attendants,  panel  chairs,  and  breakout 
session  chairs.  A  very  warm  "Thank  You"  is  extended  to  all.  In 
particular,  we  wish  to  acknowledge  Michael  Edwards,  Ngocdung 
Hoang,  Cuong  Nguyen,  Michael  Jenkins,  Chuck  Sadek,  Kathy  Lederer, 
Adrien  Meskin,  and  Dong  Choi.  A  particular  thanks  goes  to 
Elizabeth  E.  Wald  and  CDR.  Grade  Thompson,  of  the  Office  of 
Naval  Research  for  tirelessly  working  for  and  supporting  the 
technology  developments  in  this  important  area.  Finally,  we 
would  like  to  give  a  special  thanks  to  Phillip  Q.  Hwang,  who  has 
chaired  the  workshop  for  the  past  two  years,  and  whose  insight 
and  foresight  has  made  the  workshop  possible. 

We  hope  you  have  a  productive  and  enjoyable  workshop! 

Steven  L.  Howell  William  Farr 

Workshop  General  Chairman  Workshop  Assistant  Chairman 
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ABSTRACT 


1993  COHPLEZ  SYSTEMS  ENGZMEERIMG  SYNTHESIS  AND 
ASSESSMENT  TECHNOLOGY  NORXSHOP  (CSESAN  ’93) 


CSESAN  '93  is  exploring  system  level  design  synthesis  and 
assessment  capabilities  for  large/  complex  systems.  These 
capabilities  will  facilitate  the  development  of  such  systems  from 
informal  system  requirements,  through  the  design  phase 
prototyping,  and  into  implementation  and  post  deployment. 
Component  products  produced  by  these  capabilities  are 
specifications  that  subenvironments  will  receive.  The  focus  of 
the  workshop  is  the  development  and  integration  of  these  multiple 
technologies  and  the  exploration  of  the  creation  of  a  system 
level  engineering  discipline  with  support  technologies  to  provide 
potential  high  payoff  solutions  to  the  difficult  problems 
encountered  by  designers,  developers,  and  maintainers  of  real¬ 
time  systems.  The  emphasis  on  this  year's  workshop  is 
integration.  To  be  effective,  technologies  and  capabilities 
developed  need  to  be  integrated  with  the  rest  of  the  engineering 
process.  Therefore,  the  ability  to  provide  tight  linkages  to 
detailed  design  evaluation,  systems  forward  engineering  and 
systems  reengineering  must  be  developed,  ultimately  providing  a 
seamless  overall  engineering  process.  The  workshop  explores 
technology  issues  in  providing  this  seamless  process. 


j^<jBAIJIVW3?ECTEI>S 


ieoesslon  For 

9TIS  GRAtl 
DTIC  TAB 
^  Unaruiounced 


;  Justification 


Bv - 

I  Distribution/ _ 

I  Availability  Godos 


Avail  and/or 
Spaolal 


iii/iv 


Tnasday, 

0730 

0900 

0930 

0945 

1000 

1015 

1045 

1115 

1130 

1200 

1300 


1993  COMPLEX  SYSTEMS  ENGZMEERIM6  SYNTHESIS  XND 
ASSESSMENT  TECHNOLOGY  WORKSHOP  (CSESAW  *93) 


July  20-22,  1993 

Holiday  Inn 
4095  Powder  Mill  Road 
Beltsville,  Maryland  20705 


20  July  1993 

Registration 
Workshop  Overview 

Steve  Howell,  Naval  Surface  Warfare  Center  Dahlgren 
Detachment 

Automating  the  System  Engineering  Process 
John  Rumbut,  Naval  Undersea  Warfare  Center 

Requirements  Metrics:  The  Basis  of  Informed 

Requirements  Engineering  Management 

Robert  J.  Halligan,  Technology  Australasia  Pty  Ltd 

COFFEE 

Engineering  and  Analysis  of  Real-Time  Systems 
Jay  K.  Strosnider,  Carnegie  Mellon  University 

On  the  Structure  and  Dynamics  of  a  Deeply- Integrated 
Information  System 

Bruce  I.  Blum,  Johns  Hopkins  University/Applied  Physics 
Laboratory 

Integration  Components,  Spaces,  and  Cells 
Jeffrey  0.  Grady,  General  Dynamics 

An  Efficient  Approach  to  Systems  Evolution  (EASE) 

Thomas  C.  Choinski,  Naval  Undersea  Warfare  Center 

LUNCH 

An  Overview  of  the  Processing  Graph  Support  Environment 
Roger  Hillson,  Naval  Research  Laboratory 


V 


1330 

1400 

1430 

1445 

1515 

1545 

1600 

Wednesday 

0800 

0830 

0900 

0930 

1000 

1015 

1030 

1200 


A  Real-Time  Object  Model:  A  Step  Toward  an  Integrated 
Methodology  for  Engineering  of  Complex  Dependable 
Systems 

Kane  Kim,  University  of  California,  Irvine 

A  Methodology  for  Complex  Computer  Systems  Engineering 
Robert  L.  Harrison,  Naval  Surface  Warfare  Center 
Dahlgren  Division 

An  Assessment  Control  Board  (ACB)  and  a  System 
Integration  (SI)  Program  as  Complements  to  the 
Configuration  Control  Board  (CCB) 

Richard  Evans,  George  Mason  University 

A  Generic  Object  Oriented  Conceptual  Pivot  Model 
Naoufel  Kraiem,  University  of  Paris  I 

The  System  Engineering  Technology  Interface 

Specification  (SETIS) :  An  Update 

Evan  Lock,  Computer  Command  and  Control  Company 

COFFEE 

RESEARCH  AND  TECHNOLOGY  VISION  PANEL 
Chair:  Phillip  Q.  Hwang 

21  July  1993 

The  Representation  of  Resources  for  Large-Sized  and 
Complex  Systems 

Nicholas  Karangelen,  Trident  Systems  Inc. 

A  Software  Metrics  Integration  Framework 
William  M.  Evanco,  MITRE  Corporation 

Measurement  and  Evaluation  of  Complex  Navy  System 
Designs 

Osman  Balci,  Virginia  Polytechnic  Institute  and  State 
University 

Optimal  Selection  of  Failure  Data  for  Predicting 
Failure  Counts 

Norman  F.  Schneidewind,  Naval  Postgraduate  School 

Design  Structuring  for  System  Engineering 
Jee-In  Kim,  Computer  Command  and  Control  Company 

COFFEE 

COMPUTER  SECURITY  TRADE-OFF  PANEL 
CHAIR:  Kathy  Meadows 

LUNCH 


vi 


1300 


1330 

1400 

1430 

1500 

1530 

1545 

1600 

Thursday, 

oec'j 

0830 

0845 

0900 

0915 

0930 

1000 


A  Platform  for  Complex  Real-Time  J^plications 
Alexander  D.  Stoyenko,  New  Jersey  Institute  of 
Technology 

A  Testbed  for  Prototyping  Distributed  and  Fault- 
Tolerant  Protocols 

Farnam  Jahanian,  IBM  T.  J.  Watson  Research  Center 

Architectural  Synthesis  of  Mission-Critical  Computing 
5ystenis 

Parameswaran  Ramanathan,  University  of 
Wisconsin-Madison 

An  Intelligent  Real-Time  System  Assessment  Tool 
Ed  Andert  Jr.,  Conceptual  Software  Systems  Inc. 

An  Environment  for  Analysis  of  Parallel  Systems  (PAPS) 
Mohsen  Pazirandeh,  Innovative  Research  Inc. 

A  Dependable  System  Perspective 

Michelle  Hugue,  Allied-Signal  Aerospace  Company 

COFFEE 

COMPUTER  BASED  SYSTEM  ENGINEERING  (CBSE)  ISSUES  AND 
DIRECTIONS  PANEL 
Chair:  Dave  Oliver 

22  July  1993 

Real-Time  Databases  for  Complex  Embedded  Systems: 
Predictability  and  Serializability 
Kwei-Jay  Lin,  University  of  Illinois 

Divide  and  Conquer  Strategies  and  Underlying  Lossless 
Principles 

Harold  Szu,  Naval  Surface  Warfare  Center  Dahlgren 
Division 

A  Fault  Injection  Simulation  Testbed  for  Analyzing 
Fault  Tolerance  Protocols 

William  F.  Dudzik,  Advanced  System  Technologies  Inc. 

Effectively  Using  the  UNIX  Make  Utility  for  Permanent 
and  Temporary  Changes 

David  H.  Jennings,  Naval  Surface  Warfare  Center 
Dahlgren  Division 

Utilization  Bounds  for  Tasksets  with  Known  Periods 
Swaminathan  Natarajan,  Texas  A&M  University 

A  Stochastic  Control  Approach  to  Combined  Task-Message 
Scheduling  in  Distributed  Real-Time  Systems 
Dar  T.  Peng,  Allied-Signal  Aerospace  Co. 

COFFEE 


vii 


1030 


SYSTEM  INTEGRATION  PANEL 
Chair:  Evan  Lock 


I 

1200  LIINCH 

1300  Con^aring  Formal  Approaches  for  Specifying  and 

Verifying  Real-Time  Systems 

Ralph  Jeffords,  Naval  Research  Laboratory  i 

1335  Advanced  Integrated  Requirements  Engineering  System 

(AIRES) :  Processing  of  Natural  Language  Requirements 
Statements  ' 

Richard  Evans,  George  Mason  University 

1345  Requirements  Management/Requirements  Engineering 

(RM/RE) 

Luke  Campbell,  Naval  Air  Warfare  Center 

1415  Computer  Security,  Safety  and  Resilience  Requirements 

as  Part  of  Requirement  Engineering 
Daniel  Mostert,  Rand  Afrikaans  University 

1445  COFFEE  I 

1500  REQUIREMENTS  AND  TRACEABILITY  PANEL 

Chair;  Stephanie  White  I 


viix 


CONTENTS 


design  STBDCTDRE 


Page 


J^utomating  the  System  Engineering  Process . 2 

John  Rumbut — Naval  Undersea  Warfare  Center 


Requirements  Metrics:  The  Basis  of  Informed  Requirements 

Engineering  Management  .  9 

Robert  J.  Halligan — Technology  Australasia  Pty  Limited 

Engineering  and  Analysis  of  Real-Time  Systems . 15 

Jay  K.  Strosnider — Carnegie  Mellon  University 

On  the  Structure  and  Dynamics  of  a  Deeply- Integrated 

Information  System  .  29 

Bruce  I.  Blum — Johns  Hopkins  University/Applied  Physics 
Laboratory 

Integration  Components^  Spaces,  and  Cells  .  38 

Jeffrey  0.  Grady — General  Dynamics  Space  Systems  Division 

An  Efficient  J^proach  to  Systems  Evolution  (EASE) . 43 

Thomas  C.  Choinski,  John  G.  DePrimo — Naval  Undersea 
Warfare  Center 


An  Overview  of  the  Processing  Graph  Support  Environment  ...  49 

Roger  Hillson — Naval  Research  Laboratory 

A  Real-Time  Object  Model:  A  Step  Toward  an  Integrated 
Methodology  for  Engineering  of  Complex  Dependable  Systems  .  .  56 

Kane  Kim,  L.  F.  Bacellar — University  of  California,  Irvine 

A  Methodology  for  Complex  Computer  Systems  Engineering  ...  65 

Alexander  D.  Stoyenko,  Lonnie  R.  Welch — New  Jersey 
Institute  of  Technology;  Robert  L.  Harrison, 

Harry  Crisp — Naval  Surface  Warfare  Center  Dahlgren  Division 

An  Assessment  Control  Board  (ACB)  and  a  System 
Integration  (SI)  Program  as  Complements  to  the  Configuration 


Control  Board  (CCB) . 74 

Richard  Evans — George  Mason  University 

A  Generic  Object  Oriented  Conceptual  Pivot  Model  .  81 

Naoufel  Kraiem — University  of  Paris  I 

The  System  Engineering  Technology  Interface  Specification 
(SETIS):  An  Update . 94 


Baba  Prasad,  Moon  Lee,  Rajesh  Puroshothaman,  Evan  Lock — Computer 
Command  and  Control  Company 


ix 


Paae 

BEPRESEMTATXON  AND  MEASUREMENT 

The  Representation  of  Resources  for  Large-Sized  and 

Complex  Systems . 107 

Nicholas  Karangelen,  John  Intintolo— Trident  Systems  Inc.; 

Ngocdung  Hoang,  Steve  Howell — Naval  Surface  Warfare  Center 
Dahlgren  Division 

A  Software  Metrics  Integration  Framework  .  112 

William  M.  Evanco — MITRE  Corporation 

Measurement  and  Evaluation  of  Complex  Navy  System 

Designs . 126 

Osman  Balci,  David  DeVaux,  Richard  E.  Nance — Virginia 
Polytechnic  Institute  and  State  University 

Optimal  Selection  of  Failure  Data  for  Predicting  Failure 

Counts . 141 

Norman  F.  Schneidewind — Naval  Postgraduate  School 

Design  Structuring  for  System  Engineering  .  158 

Jee-In  Kim,  Evan  Lock — Computer  Command  and  Control  Company 

ASSESSMENT 

A  Platform  for  Complex  Real-Time  ^plications  .  172 

Alexander  D.  Stoyenko,  Lonnie  R.  Welch,  Carlos  Amaro,  Bo~Chao 
Cheng,  Matthew  Harelick,  Xue  Jin,  A.  K.  Ganesh,  Gray  Yu — New 
Jersey  Institute  of  Technology;  Phillip  Laplante — Fairleigh 
Dickinson  University;  Thomas  J.  Marlowe — Seton  Hall  University 

A  Testbed  for  Prototyping  Distributed  and  Fault-Tolerant 

Protocols . 179 

Farnam  Jahanian — IBM  T.  J.  Watson  Research  Center;  Ragunathan 
Rajkumar — Carnegie  Mellon  University;  John  J.  Turek — IBM 
T.  J.  Watson  Research  Center 

Architectural  Synthesis  of  Mission-Critical  Computing 

Systems . 185 

Raed  Aiqadi,  Parameswaran  Ramanathan — University  of 
Wisconsin-Madison 

An  Intelligent  Real-Time  System  Assessment  Tool  .  193 

Ed  Andert,  Jr. — Conceptual  Software  Systems,  Inc.;  Larry  Peters — 
Software  Consultants  International  Ltd. 

An  Environment  for  Analysis  of  Parallel  Systems  (EAPS)  .  .  .  198 
Mohsen  Pazirandeh — Innovative  Research  Inc.;  Oliver  McBryan — 
University  of  Colorado 

A  Dependable  System  Perspective  .  207 

M.  M.  Hugue,  N.  Suri,  C.J.  Walter — Allied-Signal  Aerospace 
Company 


X 


Page 


TMPLEMEHTATIOW  TECHNOLOGY 

Real-Time  Databases  for  Con^lex  Embedded  Systems: 

Predictability  and  Serializability . 215 

Kwei-Jay  Lin — University  of  Illinois:  Sang  H.  Son — University 
of  Virginia 

Divide  and  Conquer  Strategies  and  Underlying  Lossless 

Principles . 235 

Harold  Szu,  Edgar  Cohen,  John  Wingate — Naval  Surface  Warfare 
Canter  Dahlgren  Division 

A  Fault  Injection  Simulation  Testbed  for  Analyzing  Fault 

Tolerance  Protocols  .  249 

William  F.  Dudzik — Advanced  System  Technologies,  Inc. 

Effectively  Using  the  UNIX  Make  Utility  for  Permanent  and 

Temporary  Changes  .  257 

David  H.  Jennings,  John  J.  Reilly — Naval  Surface  Warfare 
Center  Dahlgren  Division 

Utilization  Bounds  for  Tasksets  with  Known  Periods  .  265 

Dong-Won  Park,  Swaminathan  Natarajan,  Arkady  Kanevsky — Texas 
A&M  University 

A  Stochastic  Control  Jpproach  to  Combined  Task-Message 

Scheduling  in  Distributed  Real-Time  Systems  .  273 

Dar  T.  Peng,  Kang  G.  Shin — The  University  of  Michigan 

BEQUIREMENTS 

Comparing  Formal  Approaches  for  Specifying  and  Verifying 

Real-Time  Systems  .  300 

C.L.  Heitmeyer,  R.D.  Jeffords,  B.G.  Labaw — Naval  Research 
Laboratory 

Advanced  Integrated  Requirements  Engineering  System 
(AIRES) :  Processing  of  Natural  Language  Requirements 

Statements . 309 

James  D.  Palmer,  Richard  Evans — George  Mason  University 

Requirements  Management/ Requirements  Engineering 

(RM/RE) . 316 

Luke  Campbell — Naval  Air  Warfare  Center-Aircraft 
Division,  PAX 

Computer  Security ,  Safety  and  Resilience  Requirements 

as  Part  of  Requirement  Engineering . 324 

DNJ  Mostert,  SH  von  Solms — Rand  Afrikaans  University 


xi 


PANEL  PAPERS 


Page 


Distributed  Design  of  Computer-Based  Systems: 

Traceability . 364 

Stephanie  White — Grumman  Corporate  Research  Center 

Distributed  Design  of  Computer-Based  Systems:  Needed 

Academic  Programs . 366 

Julian  Holtzman — CECASE/University  of  Kansas 

Distributed  Design  of  Conputer-Based  Systems:  Methodology  .  .  368 
David  W.  Oliver — GE  Corporate  Research  and  Development 

Distributed  Design  of  Computer-Based  System  .  370 

David  G.  Owens — Paramax  Systems 

Panel  Description:  Computer  Security  Tradeoffs . 373 

Catherine  Meadow — Naval  Research  Laboratory 

Appendix  A — List  of  Panels . A-1 

Appendix  B — List  of  Attendees . B-1 

DISTRIBUTION . ( 1 ) 


xii 


Automating  the  System  Engineering  Process 

John  Rumbut 

Naval  Undersea  Warfare  Center 
Architecture  and  Computer  System  Division 
Architecture  and  System  Development  Branch 
Newport,  R1  02841 
(401)841-3616 

rumbut®ada.npt.nuwc.navyjnil 


Abstract 

Navy  system  devel<9ment  is  a  participative  process 
carried  out  by  two  primary  groups:  customei^users 
and  producers/builders.  These  two  groups 
communicate  amongst  themselves  in  a  terminology 
and  within  a  framework  that  is  specific  to  their 
respective  domains.  Currently,  the  main  bridge  of 
commimication  between  the  two  domains  exists 
primarily  in  the  form  of  natural  language  (specifically 
text  and  notional  drawings).  With  "small*  problems 
this  type  of  communication  is  adequate.  However, 
with  "larger*  complex  systems  the  problem  of 
communication  between  the  groups  is  exacerbated. 
This  paper  begins  to  outline  a  generic  framework  for 
system  engineering  but  focuses  on  the  needs  for  Navy 
C^bat  S3’8tems  devel(4)ment 

Introduction 

Analysis  is  the  systematic  process  of  reasoning  about 
a  problem  and  its  constituent  parts  to  understand 
what  is  needed  or  what  must  be  done.  Analysis  thus 
involves  communicating  with  many  people.  Initially 
those  who  are  most  familiar  with  the  existing  need 
and  its  surroundings,  that  is,  the  problem  domain 
must  be  contacted.  Developers  will  also  need  to 
communicate  with  the  users,  managers,  and 
maintainers  because  they  are  all  potential  sources  of 
new  requirements.  A  method  is  needed  to  achieve  a 
common  understanding  of  the  problem  domain.  This 
will  allow  for  both  the  user  of  the  proposed  systems 
and  the  developer  to  have  means  to  ensure  that  they 
are  understanding  eadi  other  during  the  development 
process. 

There  are  few  accepted  standards  for  process  tools 
and  there  are  no  guidelines  pointing  out  an  efficient 
usage  pattern.  A  metric  with  which  the  effects  or 
influences  of  any  tool  or  method  can  be  predetermined 
does  not  exist  There  is  currently  no  individual  tool  or 
method  that  is  appropriate  or  suitable  to  every 
organization  or  problem  domain,  nor  is  there  a 


method  in  which  to  predict  the  future  methodologies, 
tools,  or  standards  whidi  will  eventually  emerge.  The 
mythical  silver  bullet  still  remains  elusive. 

Automation  will  have  an  important  role  in  developing 
solutions  for  large  system  development  However  we 
must  determine  a  desirable  method  to  interface  with 
information.  Ideally  the  problem  can  be  simplified  to 
an  extremely  large  database  consisting  oi  problem 
domain  information  with  links  (multiple  frontends) 
into  the  database.  The  links  are  determined  by  the 
types  of  information  being  entered,  queried,  etc.  With 
large  ss^tem  development  many  different  specialists 
will  be  involved  and  each  will  have  different 
requirements  as  to  what  type  of  information  is 
required  for  them  to  perform  their  job  correctly.  This 
will  require  different  views  Clinkages*)  of  the  problem 
domain  data. 

The  maintenance  of  these  linkages,  both  in  a 
horizontal  (across  the  problem)  and  vertical  (detailed) 
direction,  is  an  ideal  application  of  automation. 
However,  this  large  amotmt  of  data  causes  what  has 
been  referred  to  as  the  information  glut.  So  much 
data  from  many  different  areas  needs  to  be  effectively 
managed.  Concepts  such  as  information  filtering, 
information  retrieval  and  collaborative  filtering  will 
play  a  key  role  in  our  managing  d  this  information. 
[5,6,7]  These  are  similar  problems  that  fall  under  the 
category  of  library  science. 

A  framework  is  needed  that  will  allow  for  easier  bi¬ 
directional  transition  of  information  between  the 
customer  and  the  system  designer.  Currently, 
development  life-cycles  generate  a  confrontational 
environment  with  each  side  attempting  to  interpret 
documents  in  a  way  that  will  best  represent  their 
organizaticMi.  The  success  or  failure  of  the  product 
depends  upon  high  quality  information  exdrange. 
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Some  Simple  Business  Models  Observation 

System  development  projects  consist  of  three  basic 
types.  The  first  is  for  a  system  built  by  a  user  for  a 
user.  Most  of  us  have  done  this  type  of  application 
development  everywhere  from  buil^g  up  a  spread 
sheet  macro  for  our  checkbook  to  building  a  prototype 
system  to  go  onboard  a  military  platform  for 
evaluatkm.  Communication  between  the  application 
domain  and  development  is  obviously  very  high.  This 
type  of  relationship  is  what  we  hope  for  in  all  our 
systems,  how  realistic  this  is,  is  questionable. 

The  second  type  consists  of  a  developer  building  a 
product  for  a  perceived  user.  This  is  the  t3rpe  of 
system  developed  by  companies  like  Microsoft, 
Symantec,  Borland  and  other  commercial  venders. 
They  perform  market  research  and  determine  what 
new  product  will  seE 

The  third  type  ccmsists  of  a  user  describing  a  need  to 
a  devdoper  who  then  goes  off  and  builds  the  new 
system.  This  type  of  system  can  be  seen  in  the  types 
described  above  if  the  systems  are  large  enough  to  be 
built  by  others.  These  groups  though  can  stiU  belong 
to  the  same  organization/agency  but  because  of  the 
diversity  of  the  ozganization  may  not  have  a  complete 
understanding  of  the  problem  domain. 

As  we  examine  each  of  these  models  we  can  see  we  go 
from  a  very  high  fidelity  communication  model  (user  to 
user)  to  a  very  difficult  relationship  in  the  user  to 
developer  model  Navy  systems  mostly  fall  within  the 
third  category.  Communication  between  the 
developer  and  the  customer  is  filled  with  noise  which 
hampers  information  exchange.  There  exist  several 
different  points  of  view  within  this  type  of 
development.  Managing  this  form  of  communication 
is  very  difficult. 

The  information  that  will  be  provided  from  the 
problem  domain  will  be  in  terms  of  that  domain. 
Whether  this  information  comes  from  documents, 
questionnaires,  demonstrations  of  existing  systems  or 
tutorials  provided  by  the  customer  to  the  developer,  a 
great  amount  of  time,  energy  and  training  is  needed 
for  the  designer  to  become  informed  enough  to 
generate  a  proposal.  Usually  the  developer  needs  to 
learn  a  new  vocabrilaxy,  discipline  or  method.  Eh^en  if 
the  developer  has  worked  with  the  problem  domain 
before,  chances  are  the  new  system  r^ects  chanjges  of 
technology  within  the  domain,  so  new  (potentially 
radically  different)  functional  behavior  will  have  to  be 
learned. 


Information  exchange  occurs  again  when  the  designer 
submits  his  proposed  design  back  to  the  customer; 
this  time  the  customer  needs  to  be  trained.  Most 
software  engineering  firms  are  utilizing  various  CASE 
(Computer  Aided  System  Engineering^  tools  as  part 
their  specification/analysis  effort.  These  tools  will 
support  the  designer  by  assisting  in  developing 
graphical  and  textual  information  for  the 
specification.  An  assumption  is  made  that  the 
customer  will  easily  understand  the  resulting 
documentation.  This  however,  is  not  always  true. 

Commercial  informal  graphical  models  (CASE  tools) 
have  yet  to  be  proven  to  be  overly  effective  in  system 
development.  Systems  like  the  NASA  space  shuttle 
system  which  did  not  use  any  CASE  tools  was  very 
successful  in  their  SEI  (Software  Engineering 
Institute)  process  development  review  and  they  only 
used  'pencil  and  paper'  to  manage  their  development 
process.  This  leaves  us  with  questioning  the 
usefulness  of  the  informal  notation.  Are  the  informal 
graphical  models  useful?  Or  is  it  more  important  to 
provide  a  disciplined  structure  to  development? 

Automated  tools  are  commonly  used  to  help  improve 
communication  and  system  understanding.  Software 
engineering  organizations  today  see  selection  and  use 
of  software  development  and  support  tools  as  crucial 
in  improving  personal  productivity  and  product 
quality.  Each  organization  is  different;  specific 
problem  domains  may  be  their  primary  business, 
which  can  require  a  certain  hardware  device  and/or 
programming  language.  Therefore,  each  organization 
will  have  specific  expectations  and  requirements  to  be 
addressed  by  automation. 

Often  the  customer  has  sent  out  requests  for 
proposals  for  a  new  system  to  different  competing 
firms.  This  is  done  with  the  hope  of  keeping  the  costs 
down  and  improving  the  quality  of  the  product  by 
stimulating  competition.  However,  since  each  of  these 
competing  firms  may  make  use  of  different  tools  and 
methodologies  it  will  become  very  difficult  for  the 
customer  to  evaluate  the  submitted  proposals 
adequately.  This  same  type  of  problem  can  easily 
occur  in  large  prqjects  where  numerous  subcontractors 
(or  divisions  in  large  companies)  are  responsible  for 
different  subsystems.  Each  may  have  already  made 
a  significant  investment  in  a 
methodology/representation  tool.  How  does  one 
convert,  without  losing  information,  between  these 
different  tools?  Is  there  some  form  of  mapping  from 
one  to  the  other?  Some  standard  is  requir^  CASE 


3 


I 


tool  research  has  focused  on  standardizing  <m  the  use 
of  a  graphical  representation. 

Before  the  designer  became  proficient  with  the  model 
used  by  the  CASE  tool,  a  significant  amount  of 
training  was  required.  The  designer  went  to  training 
sessions,  used  ^e  technique  to  develop  small  scale 
applications  (threw  the  initial  attempts  away  and 
then  tried  again)  and  worked  with  other  team 
members  who  had  experience  with  using  the 
toofonodeL  The  customer  will  need  similar  skills  and 
experience  in  order  to  interpret  the  design 
documentation  correctly.  The  customer  may  not  have 
to  be  an  expert  in  using  the  tool  but  the  customer 
must  be  able  to  interpret  and  understand  the  model 
representation  of  the  decomposition. 

Unfortunately,  this  expertise  rarely  exists  on  the  side 
of  the  customer.  This  means  that  the  particfoants  in 
this  process  are  often  divided  between  those  who 
understand  abstraction  and  formal  terminology  of 
system/software  development  and  those  who  do  not 
For  the  latter,  formal  terminology  is  an  unacceptable 
method  of  determining  system  feasibility.  However, 
without  formalism,  the  specification  of  a  system 
cannot  be  a  basis  for  development  or  analysis.  If  a 
specification  is  vague  then  the  entire  development 
process  will  be  serend4>itou8.  This  will  cause  cost 
overruns,  increased  length  of  project  development 
time,  and  a  possible  lack  in  desired  system 
capabilities. 

As  mentioned  earlier  how  will  competing  firms  be 
evaluated  by  customers  if  each  is  using  different 
notations?  The  problem  domain  specialists  are  not 
necessarily  system  engineering  notation  specialists 
and  do  not  have  the  inclination  nor  the  time  to  learn 
numerous  notations  (and  become  expert)  in  order  to 
evaluate  pr(q>osals.  This  appears  to  provide  a  strong 
disconnect  between  developer  and  customer.  A 
simple  solution  to  this  is  for  the  customer  to 
stsu  -  ardize  on  a  set  of  notations  and  force  all  bidders 
to  use  this  set  of  notations.  Is  there  a  universal  set  of 
notations  that  will  satisfy  all  types  of  Navy 
applications?  Do  we  need  a  method  of  categorizing 
pr^lems  and  using  this  information  to  best  select  a 
development  model? 

Methodologies  and  supporting  tools  must  be  carefully 
matched  to  the  adopting  body  in  order  to  facilitate  the 
development  process.  The  choice  the  wrong  tools 
can  not  only  fail  to  improve  the  process,  but  can 
actually  work  against  it  A  universal  set  of  rules  has 
not  been  devised  to  aid  users  in  selecting  the 


'cq>timum'  set  of  tools,  nor  does  a  set  of  ’generic*  tools 
currently  exist  that  will  satisfy  the  needs  of  all 
organizations  and  problem  domains.  This  is  not 
unusual;  universal  models  of  any  sort  are  hard  to 
come  by  and  are  often  never  omnpletely  accepted 

Process  and  the  System  Engineering  Database 

When  a  need  has  been  determined  for  a  new  Navy 
application,  feasibility  studies  will  be  performed  to 
determine  whether  the  system  needs  to  be  built.  If 
the  need  is  strong  enough  a  specification  of  what  is 
required  will  be  devel(q;>ed.  Lets  call  this  Capture 
Point  1  (CPI).  Some  of  the  questions  we  need  to 
answer  are: 

What  determines  the  success/failure  of  our 
feasibility  study? 

How  do  we  effectively  estimate  costs? 

Can  we  estimate  technology  shortfalls? 

Is  this  similar  to  something  we  have  done 
before? 

Does  or  can  reuse  play  a  rde? 

Assuming  that  a  need  for  a  new  system  was 
determined,  the  Navy  will  generate  a  specification  (m 
what  is  needed  to  be  built.  The  Navy  will  have 
responsibility  for  generating  this  document  that  will 
be  delivered  to  competing  contractors  who  may  wish 
to  bid  on  the  work  for  this  new  system.  We'll  caU  this 
Capture  Point  2  (CP2).  Some  of  the  questions  at  this 
point  are: 

What  are  the  information  needs  of  those  who 
must  start  this  process? 

How  do  we  capture  information  from  the 
feasibility  studies? 

What  form  should  this  data  be  in  for  the 
bidder  to  adequately  bid  upon  this 
new  system? 

What  is  the  metric  for  determining 
completeness  of  the  specification? 

The  developers  involved  will  generate  a  prqxtsal  back 
to  the  Navy  for  evaluation.  This  is  Capture  Point  3 
(CP3).  Again  we  have  some  questions  we  want  to 
answer  at  this  point: 

In  what  format  (graphical,  text  and/or 
prototype)  should  it  arrive  back  for 
evaluation? 

How  do  we  ensure  completeness  in  the 
proposal? 

Can  we  automaticaUy  rate  portions  of  the 
proposal? 

Are  the  cost  estimates  accurate? 
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Is  the  technological  approach  correct? 

The  last  capture  point  is  the  development  process 
itself.  This  embodies  all  of  the  work  the  devek^er 
and  how  we  manage  this  process  activity.  This  is 
capture  point  four  (CP4).  We  will  not  cover  this  part 
in  this  paper. 

The  system  engineering  firamework  needs  to  unify  all 
of  these  capture  points  into  a  single  framework,  no 
matter  in  what  format  they  are  currently  stored.  The 
firamework  must  have  the  ability  to  collect  data  firom 
a  user  community  with  diverse  functional  interests. 
This  information  has  to  first  be  evaluated,  validated 
and  tied  together  in  a  cohesive  package.  At  some 
point  this  package  will  be  judged  ready  to  be 
available  for  others  to  review  and  submit  bids  for  the 
work  of  product  development.  Packages  will  be 
returned  to  the  Navy  in  a  format  that  is  'tied*  to  the 
original  specification  generated  by  the  Navy.  The 
proposal  package  firom  each  developer  is  accepted  into 
the  system  and  then  these  proposals  packages  are 
evaluated  against  each  other  and  the  original 
specification  to  determine  a  final  candidate.  The 
firamework  supports  bringing  the  problem  to  a 
solution  space  and  then  managing  the  system 
throughout  its  lifetime.  The  firamework  should  also 
support  supplying  information  (for  reuse)  to  other 
prcgects  to  1^  built  in  the  future. 

A  key  element  of  large  scale  system  development  is 
understanding  other  particq)ant's  perspective.  By 
increasing  the  amount  of  communication  to  all 
participants  a  better  appreciation  for  other’s  needs 
should  emerge.  We  can  achieve  this  by  collecting 
information  from  these  sources  provide  retrieval, 
filtering  and  analysis  of  this  information.  We  have 
already  started  much  of  this.  What  we  have  not  done 
is  effectively  'tied*  aT  of  these  sources  together. 

As  the  data  is  entered  into  the  database  it  will  need 
to  be  filtered.  Ihe  data  will  nr<^-l  to  be  entered  in  a 
manner  that  allows  for  retrieval  m  a  variety  of  ways. 
First  we  need  to  handle  the  data  base  horizontally. 
What  this  means  is  we  want  to  handle  the  ir'lividual 
sources  of  information  (documents,  charts,  graphs, 
code,  multimedia  or  other  information 
representations).  We  propose  to  manage  this  in  the 
form  of  a  Dynabase:  a  dynamic  database  containing 
the  notes,  sketdies,  papers  and  other  documents  that 
are  created  over  tame.  Much  of  the  information  that 
organizations  collect  today  is  already  in  a  machine 
readable  form.  In  a  sense  then,  the  creation  of  a 


d}'nabase  is  a  byproduct  of  everyday  work.  Once 
information  is  in  the  dynabase,  however, 

how  will  it  be  viewed  and  retrieved? 

To  what  extent  will  users  have  to  add 
retrievahenhancing  mformation  such 
as  key  words? 

To  what  extent  will  the  system  be  able  to 
generate  such  clues  by  analyzing  a 
document  or  the  context  in  which  it 
was  created? 

These  are  difficult  questions,  but  if  they  sound 
insurmountable,  do  not  despair-the  system  does  not 
have  to  be  perfect  just  better  than  it  is  today.  [1] 
Rather  than  having  to  track  down  information  in 
paper  form  (which  often  do  not  even  include  indexes) 
or  go  looking  for  a  copy  of  a  floppy  with  a 
document/chart  on  it  or  do  a  file  search  through 
directories  we  can  use  more  automated 
searching/tracing  techniques.  An  example  of  this  is 
the  Wide  Area  Information  Servers  (WAIS).  This  is  a 
firee>text  search  which  is  highly  amenable  to  parallel 
computing  (WAIS  is  implemented  on  a  Thinking 
Machine's  (^nnection  Machine)  .  In  most  text* 
retrieval  systems,  queries  are  limited  to  Boolean 
combinations  of  a  few  terms,  but  since  text  on  a 
Connection  Machine  is  fast,  searches  for  documents 
which  are  similar  to  an  entire  document  are  practical 
This  same  technique  is  used  in  Dow  Jones's 
DowQuest,  a  commercial  system  that  uses  a 
connection  Machine  to  scan  more  than  150,000 
articles  from  195  publications  for  relevance  to  on*line 
queries.  [1] 

Lotus  notes  is  a  commercial  venture  in  providing  tools 
for  organizing  people  and  diverse  information  sources. 
There  are  several  other  firms  which  have  entered  the 
field  commonly  known  as  groupware.  This  technology 
has  stressed  the  importance  of  keeping  all  parties 
informed.  What  makes  this  technology  very  us^ul  for 
our  purposes  is  that  it  links  (or  ties)  information  not 
just  vertically  like  traditional  CASE  tools  but 
horizontally  across  numerous  project  functions; 
thereby  increasing  the  amount  of  positive  exchange  of 
information  between  all  devel(q)ers  and  customers. 

The  information  should  be  used  to  assist  the 
customer  and  the  developer  in  identifying  critical 
areas,  errors  and  make  suggestions  for  possible 
remedies.  For  example  with  development  data  on 
line  it  may  be  possible  to  determine  system  schedules 
by  access'ng  each  of  the  development  area's 
databases.  It  should  also  be  possible  to  query  the 
system  to  see  how  many  functions  are  assumed  to 
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have  access  to  a  device  or  provide  timing  budget 
information  so  system  performance  can  be 
detennined.  This  infomation  is  available  online,  not 
reported  every  month.  With  the  resultant  infonnation 
from  these  types  of  data  analysis  we  hope  to  provide 
insight  on  our  system  shortfalls.  This  type  of 
checking  is  simplistic.  If  we  are  to  manage  numerous 
forms  information  we  have  to  add  more  intelligence 
into  our  automatkm. 

In  12]  we  worked  on  the  problem  of  symbolic/numeric 
representation.  In  that  work  we  managed  the  state 
space  by  observing  trends  in  both  the  measured  and 
computed  data.  On  the  basis  of  the  observations,  the 
knowledge  base  would  generate  a  solution  to  the 
problem  or  generate  a  new  set  of  parameters  for 
reevaluating  the  state  space.  We  accomplish  the 
generation  of  symbolic  information  detecting 
thresholds  and  patterns  [2].  Thresholds  are  numeric 
values  for  such  things  as  in  the  example  above.  As 
thresholds  are  reached  symbolic  representations  are 
given  to  the  data  set  These  data  sets  are  evaluated 
against  our  rule  set  to  look  for  higher  order  patterns. 
When  we  examine  the  set  we  can  determine  what 
enrols  there  are  (if  any)  and  where  they  are  located  by 
using  our  linked  data.  A  useful  technology  for 
working  this  problem  is  a  blackboard  architecture 
Figure  1). 

Large  diverse  clusters  of  data  need  to  be  used  in  a 
synergistic  pattern  in  order  for  system  engineers  to 
better  manage  the  process  of  development.  A 
potential  analysis  tool  is  blackboard  architecture. 
[3,7]  A  very  basic  blackboard  concept  is  shown  in 
Figure  1.  The  blackboard  model  is  a  complex 
problem-solving  model  prescribing  the  organization  of 
both  knowledge/data  and  the  problem-solving 
behavior  within  a  single  architecture.  Hiis  is  similar 
to  the  familiar  structured  model  of  computation  which 
has  a  program  acting  on  data.  A  blackboard  model 
consists  of  a  global  database  called  the  blackboard. 
There  are  logically  independent  sources  of  knowledge 
which  are  called  the  knowledge  sources.  The 
knowledge  sources  will  respond  to  changes  in  the 
blackboard  to  generate  new  hypothesis  about  the 
data,  (there  are  several  different  models  for 
bladdxMurd  architectures  see  [6]  for  more  information 
on  blackboards) 

Typically  the  blackboard  architecture  is  using  these 
sources  to  find  ’patterns'  to  help  solve  a  problem 
jointly.  For  managing  this  data  the  solution  may 
include  finding  issues  that  are  common  to  a  cluster  ^ 
users.  Simple  examples  may  include  keyword 


searches.  Each  author  of  a  document  submits  the 
paper  to  the  s}’stem.  The  system  searches  through 
the  document  and  compares  it  with  some  keywords  it 
may  already  have  in  its  data  base.  Hie  auth<Hr  of  the 
document  may  have  made  changes  since  the  last  time 
the  document  was  entered  into  the  system.  The 
system  can  recognize  that  there  were  changes  and 
attempt  to  associate  those  changes  with  possible 
side-effects  to  other  sections  of  the  system  under 
development.  These  changes  can  impact  others  in 
technical,  cost  and  schedule  areas.  Low  level 
infonnation  sources  are  clustered  and  the  effects  of 
this  information  is  brought  to  the  attention  of  the 
appropriate  personnel 

These  searches  of  patterns  are  not  limited  to  simple 
keywords.  It  should  be  possible  to  access  infonnation 
from  all  of  the  dynabase  that  have  been  collecting 
information.  It  is  possible  for  example  to  perform 
evaluations  of  data  ^ctionaries  of  data  flow  diagrams 
and  compare  it  with  legacy  systems  to  see  if  there  are 
candidates  for  reuse  and/or  mapping  of  performance 
requirements  to  potential  COTS  (Oimmercial  Qff-the- 
Shelf)  hardware/software  components.  (Currently 
there  is  an  effort  underway  in  the  COSIP  program  to 
develop  such  a  database  to  support  this  kind  of 
effort.)  Obviously  not  all  factors  can  be  evaluated, 
but  a  lot  of  the  mundane  consistency  checks  can  be 
automated  or  at  least  provide  some  level  of 
assistance  (e.g.  checking  the  government's 
specification  against  the  contxactors  proposal). 


Black  to  ard 
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SSowcat 


Simple  Blackboard  Model 
Figure  1 


This  type  of  activity  would  need  to  run  in  a  batch  type 
environment.  Once  the  knowledge  base  has  been 
created  we  envision  the  user  navigating  through  b(^ 
raw  and  processed  informatitm.  By  processed  we 
mean  that  the  knowledge  sources  have  made  some 
recommendations  to  the  user  whidi  they  may  want  to 
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follow  up.  We  feel  this  is  more  how  the  system 
engineer  is  workings  Working  in  two  directions  both 
in  a  vertical  direction  and  a  horizontal,  by  having  n 
tool  kit  that  saves  the  engineer  from  spending  their 
time  looking  for  information,  evaluating  trivial 
information  or  reporting  information  they  should  be 
able  to  focus  more  on  the  problem  solving  issues. 
This  type  of  information  merger  also  provides  a  better 
picture  to  management  who  needs  to  monitor  budget 
and  schedule. 

System  development  is  a  process  of  discovery,  where 
as  you  add  more  information  and  evaluate  it  against 
previous  knowledge  you  will  find  that  more 
information  is  needed  or  the  wrong  information  was 
given  or  new  problems  arQ  e]q;>osed,  etc.  Each  time 
you  increase  the  granularity  of  knowledge  you  face  the 
potential  of  finding  errors.  For  complex  systems  help 
is  needed  in  evaluating  all  of  this  information.  In  the 
area  of  reuse,  we  feel  that  incorporating  previous 
work  at  earlier  stages  of  develc^ment  will  he^  ensure 
maximum  reuse. 

Robust  standards  will  play  key  roles  in  making  large 
system  development  more  manageable.  Providing  a 
develqier  a  standard  to  'design  to*  rather  than  having 
to  develop  new  technologies  to  meet  the  needs  will 
obviously  reduce  some  degrees  of  freedom  but  it  will 
also  help  bound  the  problem.  The  effectiveness  of 
this  will  be  premised  on  the  quality  of  the  standards. 
These  standards  need  to  be  fully  evaluated  towards 
issues  such  as  performance  before  they  are  specified. 

Forcing  the  use  of  hardware/software/humwave 
/security/communication  standards  will  be  difficult 
Quite  often  a  developers'  cultural  attitude  is  not  to 
accept  work  from  some  other  place,  the  classic  'not 
from  my  shop'  syndrome.  Fortunately,  newer 
technok^es  such  as  object-oriented  technologies  are 
changing  this  perspective.  The  culture  here  is  'why 
should  I  rebuild  what  has  already  been  done?'.  This 
attitude  is  already  common  in  ha^ware  development 
where  it  was  far  firtan  cost  effective  to  generate  a  new 
hardware  device  when  using  existing  ones  is  much 
more  cost  effective.  The  hardware  culture  here  is 
about  reuse.  Ihey  proactively  search  for  new  sources 
of  reusable  components  and  develop  skills  and  tools 
for  solving  problems  with  this  limited  set  of 
components. 

The  hardware  developer's  task  is  somewhat  limited. 
The  software  aspect  has  to  encapsulate  the 
functionality  of  what  the  user  wants  and  logically 
omtrol  the  hardware  resources.  This  mapping  to  the 


hardware  is  dynamic  and  difficult  to  predict.  But 
standards  are  a  good  method  of  reducing  the  number 
of  unique  mappings.  These  standards  will  need  to  be 
part  of  our  database/knowledge  sources,  they  will 
provide  convenient  'bounding  to  our  problem  solving 
activity. 

Future:  Object  Technology  Visual 

Technology^  Assemblers 

In  the  past  when  we  were  faced  with  problems  of  high 
degrees  of  complexity  we  raised  the  level  of 
abstraction  to  manage  the  problem.  Where  we  had 
once  used  assembly  language  as  the  language  to 
build  our  systems  we  now  use  high  level  languages 
such  as  Pascal,  Ada  and  C-m-.  We  do  not  attempt  to 
manage  the  assembly  code  generated  by  these 
languages.  We  argue  that  it  is  time  again  to  raise 
the  level  of  abstraction.  In  the  beginning  part  of  this 
paper  we  discussed  ways  of  automating  the  system 
engineering  process  to  help  improve  communication 
between  the  developer  and  the  customer.  These  are 
solutions  for  today.  For  tomorrow  we  envision  a 
different  relationship  all  together. 

Instead  of  having  devek^ers  work  with  the  customer 
as  in  model  three,  developers  will  provide  classes 
objects.  These  classes  will  be  accessed  with  visual 
environments  that  will  help  the  user  understand  their 
functionality.  Since  there  are  differences  between  the 
rather  simple  behavior  dL  a  hardware  device  and  the 
wide  range  of  functionality  of  a  software  component 
additional  information  should  be  made  available  to 
the  user.  This  can  include  multimedia  presentations, 
graphical  representations  (data  flow,  object,  Petri 
nets,  etc.)  besides  textual  descriptions.  The  user  will 
'assemble'  these  classes  together  at  a  high  level  cl 
abstraction  using  graphical  techniques. 

Visual  programming  languages  (such  as  Prograph) 
and  even  simulation  tools  (like  SES/Workbench  and 
CACI)  have  added  kxHiic  representaticms  cl  elements 
that  can  be  connected  in  some  digraph  form.  These 
elements  make  up  a  rich  set  of  primitive  functions 
that  can  be  further  defined  by  selecting  properties 
each  of  the  elements.  Metamodeling  allows  the  user 
of  such  an  iconic  model  to  build  new  representations 
from  these  primitives  to  better  represent  their 
problem  under  study. 

The  problem  of  reuse  and  repository  science  is  still  an 
open  issue.  The  metamodel  philosophy  allows  us  to 
tailor  generic  techniques  to  match  individual  needs. 
This  would  require  developers  to  work  on  the 
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environment  supporting  the  users  doing  the 
devel(4)ment  rather  than  p^orming  the  development 
themself.  The  user  is  working  more  like  in  the  first 
model  which  had  the  highest  level  of  communication 
and  the  developer  is  working  more  like  the  second 
(perceived  nee<0  rather  than  the  third  model  (user  to 
devel(q)er).  The  communication  between  the 
devel(q>er  and  the  customer  should  be  of  higher 
quality  and  in  shorter  spurts  since  communicating 
will  focus  on  construction  of  a  particular  class  rather 
than  an  entire  system. 

Developers  will  work  more  on  building  up  class 
libraries  of  information  for  the  customer  to  use  to 
build  the  system.  This  is  similar  to  how  hardware 
devek^ment  occurs  now.  A  hardware  vender,  using 
business  model  two,  sees  a  need  for  a  particular 
hardware  device.  They  build  the  device  and  market  it 
to  their  customers.  After  a  while  whole  catalogs  of 
these  devices  are  available,  like  a  TTL  data  book, 
and  users  will  build  their  new  systems  from  these 
catalogs.  Work  has  already  been  done  in  this  area  for 
VHDL,  where  they  are  building  libraries  of  models  of 
devices  for  simulation  tools. 

These  are  not  new  concepts  but  technologies  such  as 
blackboards,  parallel  computers  to  speed  up 
searches,  a  new  cultural  attitude  towards  reuse  and 
better  graphical  capabilities  make  these  ideas 
possible.  The  firamework  we  spoke  of  earlier  can 
become  the  repository  of  this  new  assembler 
technology. 

Summary 

Our  research  has  lead  us  to  not  have  a  single  'static' 
rq)resentati(m  of  the  system  but  rather  to  express  the 
pi^lem  domain  in  a  dynamic  fashion.  This  has  led 
to  developing  a  unified  infiarmal/formal  paradigm.  We 
attempt  to  express  the  information,  the  same 
information,  in  numerous  forms  and  allow  for  various 
methods  of  infiarmatum  retrieval  It's  important  that 
this  representation  be  in  a  form  that  all  people 
involved  are  comfortable  with  and  basic  enough  to 
allow  for  communication  across  different  domains. 

As  reuse/!reegineering/!repository  technologies  mature, 
new  business  models  will  emerge. 
Visualization/(}ommercial  0£r-the*Self  hardware  will 
allow  for  users  to  develop  more  of  their  system  than 
previously  done. 

This  paper  addressed  a  simple  wish  list  of  what  a 
system  engineering  framework  should  do.  It  is 


heavily  biased  on  comments  made  from  a  small  set  of 
system  engineers  and  my  own  preferences.  In  no  way 
is  this  a  comprehensive  list  of  what  is  required  to 
meet  the  system  engineer’s  needs  but  it  is  a  st^  in 
the  direction  of  specifying  what  an  engineering  station 
could  look  like. 
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Abstract 

Available  data  demonstrates  that 
defective  requirements  are  a  dominant  cause  of 
cost  and  schedule  overrun  in  defense  and 
aerospace  programs.  This  paper  presents  a 
structured  methodology  for  measuring  the 
quality  of  requirements,  individually  and 
collectively.  It  is  shown  that  requirements  may 
be  characterized  by  ten  quality  factors,  each 
Vfith  an  associated  metric,  and  by  an  overall 
requirements  quality  metrics.  In  addition,  the 
requirements  engineering  process  itself  can  be 
instrumented  by  means  of  five  process-related 
metric.  The  paper  describes  the  author's 
experience  with  application  of  both  types  of 
metric  to  engineering  decision  making.  A  tool 
which  automates  aspects  of  metrics  collection  is 
presented. 

1  Introduction 

Requirements  engineering  deals  with  the 
capture,  analysis,  expression  md  traceability  of 
requirements.  Requirements  engineering  may 
commence  at  the  level  of  a  broad  statement  of 
military  need,  and  will  continue  through  the 
definition  of  the  system  solution,  right  down  to 
the  lowest  levels  of  specification  of  elements  of 
that  solution,  for  example,  C,  D  and  E 
specifications  in  the  hardware  world  and 
minispecs  in  the  software  world. 

Ih^irements  engineering  does  not  simply 
happen,  it  requires  management.  Classically, 
management  is  considered  to  comprise  planning, 
otiganizing,  staffing,  monitoring  controlling. 
If  we  accept  that  "that  which  cannot  be 
measured  cannot  be  controlled*,  the  role  of 
requirements  metrics  is  readily  apparent 

But  whidi  metrics?  Should  we  instrument 
the  product  (the  requirenwnts)  or  the  process 
(the  requirements  engineering  process)  or  both? 
How  can  requirements  metrics  be  used  to  help 
the  project  team  satisfy  project  success  criteria? 
These  and  related  issues  are  addressed  below. 


2  Tho  State  of  dm  Rcqutreiiienls  Ait 

Data  from  TRW  developed  in  the  early 
1960s  showed  that,  on  a  range  of  representative 
projects,  30  per  cent  of  design  proUems  requiring 
correction  were  due  to  erroneous  or  incomplete 
specffications.  Aiwther  24  per  cent  of  errors  were 
due  to  conscious  deviation  from  product  and 
process  requirements.  Other  studies  f1]  have 
shown  that  the  cost  to  correct  an  error  typically 
increases  by  a  factor  of  between  20  and  1000  over 
the  life  cyde  of  a  system  acquisition.  System 
solutions  which  sat^  the  contracf,  but  not  the 
need,  are,  unfortunateiy,  common|;riace. 

Engineering  practitioners  have  come  to 
regard  improv^  requirements  engineering  as 
one  of  the  challenges  of  the  90's.  Tlw  responses 
to  this  challenge  have  induded: 

•  early,  concurrent  development  of  product 
and  process  requirements  covering  all 
product  life  cyde  phases,  from  concept 
through  to  disposal  [2]; 

•  improved  analysis  of  requirements  by  the 
use  of  operatioiud  requirements  languages 
and  assodated  tools,  for  example  RDD  [3]; 
aiMl 

•  management  of  requirements  through 
integration  of  text  processing  and  relatioi^ 
database  (or  similar)  support  [4],  resulting 
in  improvements  in  requirements 
traceability  and  in  the  pn^uctivity  of 
requirements  analysis  and  flowdown 
activities. 

This  latter  trend  has  brought  with  it  a 
tendency,  highly  beneficial  in  the  author's 
view,  to  manage  all  program  requirements  as  a 
single  set  Requirements  may  be  readily 
allocated  across  all  elements  of  the  program, 
for  example  the  prime  mission  products(s), 
project  management;  system  engineering,  test 
and  evaluation,  production,  etc,  and  their 
interfaces.  Within  each  of  these  program 
elements,  requirements  may  be  deomnposed  and 
allocated  to  lower  level  elements,  product 
interfaces  and  functional  interfaces. 
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3  Users  of  Requirements 

Since  requirements  define  the  product  or 
process  to  be  realized,  it  is  axiomatic  that  the 
success  of  any  program  is  closely  linked  to  the 
adequa^  of  definition  and  communication  of 
requirements: 

•  users  rely  on  requirements  as  a  precise 
expression  of  their  need; 

•  the  program  office  relies  on  requirements  for 
eliciting  offers; 

•  both  the  customer  and  the  contractor  rely  on 
requirements  as  an  expression  of  their 
agreement  as  to  what  is  to  be  delivered;  and 

•  thie  functional  elements  of  the  project 
organizations  of  both  the  customer  and  the 
contractor  rely  on  requirements  as  an 
expression  of  what  they  are  to  deliver  to 
their  respective  internal  customers. 

4  Requirements  Quality 

Requirements,  to  satisfy  their  users,  must,  in 
their  expression,  exhibit  certain  attributes.  We 
refer  to  these  attributes  as  requirements  quality 
factors.  The  author  has  found  that  a  set  of  ten 
requirements  quality  factors  is  necessary  to 
adequately  define  the  quality  of  requirements, 
individually  and  collectively. 

Correctness  refers  to  an  absence  of  errors  of 
fact  in  the  statement  of  requirement 

Completeness  requires  that  the  requirement 
contain  all  of  the  information  necessary, 
including  constraints  and  conditions,  to  enable 
the  requirement  to  be  implemented  such  that 
the  need  will  be  satisfied. 

Consistency  requires  that  a  requirement  not 
be  in  conflict  with  any  other  requirement,  nor 
with  any  element  of  its  own  structure. 

Clarity  requires  that  the  requirement  be 
readily  understandable  without  semantic 
analysis. 

Non-Ambiguity  requires  that  there  be  only 


one  semantic  interpretation  of  the  requirement 
Connectivity  refers  to  the  property 
whereby  ail  of  the  terms  within  the 
requirement  are  adequately  linked  to  other 
requirements  and  to  word  and  term  definitions, 
so  causing  the  individual  requirement  to 
properly  relate  to  the  other  requirements  as  a 
set. 

Singularity  refers  to  the  attribute  whereby 
a  requirement  carutot  sensibly  be  expressed  as 
two  or  more  requirements  having  different 
subjects,  verbs  and/or  objects. 

Testability  refers  to  the  existence  of  a  finite 
and  objective  process  with  which  to  verify  that 
the  requirement  has  been  satisfied. 
Modifiability  requires  that: 

a.  necessary  changes  to  a  requirement  can  be 
made  completely  and  consistently;  and 

b.  the  same  requirement  is  specified  only  once. 
Feasibility  requires  that  a  requirement  be 

able  to  be  satisfied: 

a.  within  natural  physical  constraints; 

b.  within  the  state-of-the-art  as  it  applies  to 
the  project;  and 

c  within  all  other  absolute  constraints 
applying  to  the  project 

5  A  Requirements  Structural  Model 

Requirements  are  most  commonly  expressed 
as  natural  language  statements,  although 
graphical  and  formal  mathematical 
requirements  languages  are  widely  used. 

For  the  natural  language  type  of  expression, 
requirements  quality  metrics  may  be  developed 
through  the  parsing  of  each  requirement 
statement  into  the  elements  of  a  structural 
model  of  a  sound  requirement,  a  template.  A 
template  found  to  be  suitable  for  English 
language  requirement  statements  is  illustrated 
in  figure  1  [after  5].  Figure  1  also  shows  an 
example  requirement  parsed  into  the  template. 


I 
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Original  Raqulramant: 

m  me  ComtMi  Zone,  an  HQ  Swncn,  wnicn  is  Idenflcal  to  a  SunK  node  ewNcD,  snaR  be  given 
t««o  (2)  Independent  BnIcB  to  SI  least  t«wo  (2)  other  nodeeki  me  nstwcrk. 


Element 

Text 

1  Actor 

anHQSwneh 

2  Condittona  for  Action 

In  me  Combat  Zone 

3  Action 

shaRbeglvan 

4  Constrainta  of  Action 

5  Object  of  Action 

two  (2)  independent  InKs 

e  RotlnanMnt/Source  of  Object 

7  Refinement/Deattnatlon  of  Action 

to  at  least  two  (2)  other  nodes  m  me  network 

8  Other 

which  is  idenflcal  to  a  tunk  node  switoh 

Figure  1  •  Requirement  Structural  Template 
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Elements  of  the  template  are  defined, 
general^  in  accordance  with  Fuji  (5),  as  below: 

Actorllniator  of  Action.  This  is  the  subject 
of  the  sentence  *  the  thing  being  specified. 
Examples  are:  "the  system",  "the  interface", 
"the  function",  — ........ 

Action.  This  is  a  verb  •  the  action  to  be 
taken  by  the  actor  (subject).  Examples  are 
"shall  calculate",  "shall  display",  "shall  fly". 


Object  of  Action.  This  is  a  noun,  and  is  the 
thing  acted  upon  by  the  actor.  Examples  are: 
"the  message",  "the  input  signal",  — 

Omditions  of  Action.  This  defines  the 
conditions  under  which  the  action  takes  place, 
for  example"upon  receipt  of  a  message",  "in 
high  resolution  mode",  "within  10  minutes  of 
power-on",  — 

Constraints  of  Action.  This  qualifies  the 
action,  for  example  "at  a  resolution  of  400  x  1000 
pixels",  "within  limits  imposed  by  vehicle 
speed",  — 

Refinement J Source  of  Object.  These  qualify 
the  ol^ect,  for  example  (refinement):  "of  flash 
priority",  for  example  (source):  "from 
DISCON". 

Refinement/Destination  of  Action.  These 
further  qualify  the  action,  and  may  be 
additional  to  Constraints  of  Action.  Examples 
are  "within  10ms",  "to  DISCON", 

Other.  This  element  collects  non¬ 
requirements  material 

6  Requirements  Quality  Metrics 

A  strong  requirement  will  have  each 
applicable  element  of  the  requirement,  and  the 
requirement  overall,  satisfying  each  of  the 
quality  factors  described  earlier.  This  ideal 
provides  a  basis  for  the  development  of 
requirements  quality  metrics. 

Figure  2  illustrates  the  construction  of  a  set 
of  metrics  based  on  parsing  of  a  requirement  into 
the  template. 


These  metrics  are  defined  below. 

IRQ  Individual  Requimnent  Quality 

This  metric  for  a  single  requirement  is  a 
number  between  0  and  1,  1  representing  a 
"perfecr  requirement  and  zero  representing  a 
totally  defective  requirement.  The  metric  is 
constructed  from  the  parsed  version  of  the 
requirement  by: 

a.  determining  which  of  the  possible  seven 
elements  of  the  structure  are  applicable  and 
assigning  a  value  of  1  to  each  applicable 
element  (most  requirements  have  5-7 
applicable  elements); 

b.  assessing  each  element  of  the  parsed 
requirement  against  the  quality  factor 
criteria,  and  scoring  each  applicable 
element  as  1  (satisfactory)  or  0 
(unsatisfactory).  An  element  may  be 
unsatisfactory  because  it  is  missing,  or 
because  it  is  defective  in  some  other  way.  A 
variant  on  the  approach  is  to  permit 
individual  element  scores  between  the 
limits  of  1  and  0,  although  it  is  doubtful 
whether  this  refinement  offers  any 
significant  benefit; 

c  calculating  the  metric  by  dividing  the  sum 
of  the  applicable  element  values  into  the 
sum  of  the  element  scores. 


IQFI-IQFIO 

Individual  Quality  Metrics 

Ten  individual  (requirement)  quality 

metrics  correspond  to  the  ten  requirement 
quality  factors,  as  follows: 

IQFl 

Correctness 

IQF2 

Completeness 

IQF3 

Consistency 

1QF4 

Clarity 

IQF5 

Non-Ajmbiguity 

IQF6 

Connectivity 

IQF7 

Singularity 

IQF8 

Testability 

IQF9 

Modifiability 

IQFIO 

Feasibility 

mmamm 

score 

"'Metric  ■ 

Name 

Metric 

Value 

Actor 

Conditions  of  Action 

Action 

Refinement  of  Action 

Ob«^  of  Action 
Refinement/Source  of  Ot^ 
Refinement/Destination  of  Action 
TOTAL 

1 

1 

1 

0 

1 

1 

1 

6 

0 

0 

0 

0 

0 

0 

1 

1 

Correctness 

Completeness 

Consistency 

Clarity 

Norvfunbiguity 

Connectivity 

Singularity 

TestabOity 

ModifiabHIty 

FeasbiKty 

“TOFT - 

IQF2 

IQF3 

IQF4 

IQF5 

IQF6 

IQF7 

IQF8 

IQF9 

IQFIO 

0 

0 

1 

1 

0 

0 

1 

0 

1 

1 

:;::j 

I 

Rguie  2  -  Construction  of  Requirement  Quality  Metrics 
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These  metrics  assume,  for  an  individual 
requirement,  a  value  of  1  or  0  depending  on 
whether  the  requirement  overall  has  a  defect  of 
that  type  (0)  or  not  (1).  Again,  scoring  between 
these  range  limits  huiy  be  used  if  desii^. 

The  metrics  for  individual  requirements 
rarely  directly  serve  a  useful  purpose.  It  is 
necessaiy  to  aggregate  individual  requirements 
metrics  to  form  metrics  for  groups  of 
requirements  in  order  to  serve  our  objective  of 
control.  In  making  this  transposition  to 
aggregate  metrics,  we  have  consistently  found 
the  need  to  adjust  completeness  to  allow  for 
requirements  which  are  missing  altogether,  not 
just  incomplete  in  the  sense  of  missing  a 
condition  or  a  refinement 

Requirements  which  have  been  omitted 
may  be  accounted  for  by  estimating  an  omission 
ratio  for  each  requirement  that  is  present  The 
omission  ratio  is  the  number  of  new  requirements 
that  would  be  created  if  all  possible  areas  of 
omission  suggested  by  the  requirement  that  is 
present  were  pursued  to  resolution.  The  omission 
ratio  must  be  constructed  such  as  to  support 
aggregation  of  requirements  having  different 
omission  ratios. 

The  quality  metrics  for  sets  of  requirements 


correspond  to,  and  are  produced  from,  the 
individual  metrics,  as  follows  (for  n 
requirements): 

RQ  Requirements  Quality 

n 

QFl  Correctness 


QF2  Completeness 

QF2  m  '^omissionratio 

H  n 

Note  that  completeness  may  have  a 
negative  value. 

QF3  to  QFIO  are  derived  as  for  QFl. 

7  Application  of  Requirements  Quality 

Metrics 

A  metric  is  only  of  value  if  it  assists  in 
decision  making. 

Areas  of  application  of  the  metrics 
described  above  are  summarized  inTid>Ie  1. 

Metrics  should  only  be  used  where  they 
contribute  positively  to  the  degree  o  f 
satisfaction  of  project  goals,  including  cost 
goals. 


Metric 


Application 


RQ  Reqaiicmenta  Quality 


QF1>QF10  Requirements  Quality  Factors 


•  estimation  of  requirements-related  bidding  risk/opportunity 
(depending  on  me  type  of  oontrset) 

•  estimation  of  requirements-related  contract  tisk/cq>portunity 

•  determiiuition  of  the  skills  and  level  of  resources  required  for 
requirements  analysis 

•  measurement  of  the  quality  of  the  product  of  requirements 
analysis,  in  relation  to  decisions  such  as: 

a.  terminatxm  of  formal  requirements  analysis; 

b.  whether  the  project  is  ready  for  System  Rquirements 
Review  ^RR),  Software  Spedficatitm  Reviews  (SSR)  and 
other  requirements  reviews; 

c  whether  system  requirements  are  sufficiently  mature  for 
establishment  of  the  fortetiotud  baselirte; 

d.  whether  a  requirements  are  sufficiently  mature  for 
estaMidunent  or  the  allocated  basdine; 

•  assessment  of  the  specification  writing  skill  levels  of  project 
team  members 

•  estimation  of  requirements-related  subcontract 
risk/opportunity 

•  use  as  a  technical  performance  measurement  (TPM)  parameter 

•  identification  of  aspects  of  requirements  which  are 
unsatisfactory 

•  identification  of  requirements-related  skills  in  whidi  trainii^ 
of  project  posonnel  is  needed 

use  as  a  TPM  parameter 


Table  1  •  Application  of  Requirements  Quality  Metrics 
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8  Typical  Values  of  Requirements 
Quality  Metrics 

CXir  experience  in  use  of  the  metrics  suggests 
the  typical  relationships  between  values  of  the 
metrics  and  requirements  quality  shown  in  Table 
2. 

9  Requirements  Process  Metrics 

Table  1  indicated  the  application  of 
requirements  quality  metrics.  We  have  also 
found  it  beneficial  to  use,  for  engineering 
management  purposes,  requirements  process 
metrics,  derived  for  requirements  analysis  tasks 
such  as  system  requirements  analysis,  software 
requirements  analysis  for  CSCls  and  hardware 
requirements  analysis  for  HWCIs. 

Useful  metrics  include: 

RSTA  Percent  Started 

This  metric  indicates  the  percentage  of 
source  requirements  currently  under 
development,  the  "work  in  progress". 

RTBD  Percent  “To  be  Detennined" 

This  metric  indicates  the  percentage  of 
requirements  containing  TBDs,  ie,  requirements 
for  which  the  resolution  of  incompleteness  is 
beyond  the  resources  of  the  analyst  and  which 
have  been  referred  to  other  individuals. 


organizations  or  phases  for  resolution  of  missing 
information. 

RCOM  Percent  Completed. 

This  metric  indicates  the  analyst's  view  of 
that  analysis  of  the  source  requirement  has  been 
completed. 

RAPP  Percent  Approved. 

This  metric  indicates  the  percent  of  source 
requirements  for  which  the  results  of  analysis 
(child  requirements)  have  been  approved  for 
incorporation  in  the  destination  document 

In  addition,  the  need  to  control  the  process 
of  formally  decomposing  and  allocating 
requirements  of  an  element  in  the  system 
hierarchy  to  its  subordinate  elements  has  led  to 
an  additional  metric 

RAIL  Percent  Allocated. 

This  metric  indicates  the  percent  of  parent 
requirements  of  an  element  at  one  level  of  the 
WBS  for  which  the  corresponding  child 
requirements  have  been  allocated  to  the 
applicable  lower  level  elements. 

All  of  the  above  process  metrics  provide 
data  for  earned  value  measurement  within 
project  cost/ schedule  control  systems.  In 
addition,  RTBD  has  proved  to  be  a  useful 
parameter  for  incorporation  into  a  technical 
performance  measurement  (TPM)  program  [2]. 


Metric 

Very  poor  set  of 
requirements, 
requiring 
sutetantial 
development 

Fair  set  of 
requirements,  ntay 
just  be  suitable  for 
purposes  of 
solicitation, 
depending  on  the 
SOW  and  type  of 
contract  envisaged 

Requirements  at 
SRR  suitable  for 
carrying  forward 
into 

development 

Requirements 
suitable  for 
establishment  of 
the  Functional 
Baseline 

RQ- 

0.01-0.3 

0.3-0.7 

0.95-0.99 

0.99+ 

QFl-Coirectness 

0.9 

0.98 

0.99 

0.99+ 

QF2-Cbmpleteness 

-5 

0 

0.95 

0.99+ 

QF3-Consistency 

0.9 

0.97 

0.99 

0.99+ 

QF4-Clarity 

0.9 

0.97 

0.99 

0.99+ 

QF5-Non- Ambiguity 

0.3 

0.7 

0.9 

0.98+ 

QF6-Connectivity 

0.3 

0.9 

0.99 

0.99+ 

QF7-Singularity 

0.1 

0.3 

0.99+ 

1 

QF8-Testablity 

0.1 

0.7 

0.99 

0.99+ 

QF9-Modifiabiiity 

0.1 

0.5 

0.99 

0.99+ 

QFIO-Feasibility  , 

0.95 

0.99 

0.99+ 

0.99+ 

Table  2  •  T3r|rical  Values  of  Requirements  Quality  Metrics 
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10  Computer  Support  to  Metrics 

Generation 

Requirements  management  benefits 
substantially  from  the  use  of  computer  based 
tools  which  facilitate,  in  particular,  efficient 
text  handling,  rigorous  requirements  allocation 
and  the  creation  and  maintenance  of  peer  and 
parent-child  relationships  for  requirements 
traceability  purposes.  Metrics  prove  to  be  most 
easily  calculated  where  a  CASE  environment  is 
in  use  for  those  other  aspects  of  requirements 
management 

One  CASE  tool  for  requirements 
management  with  which  the  author  has 
experience  is  Document  Director  ReqMgr, 
produced  by  Bruce  G.  Jackson  &  Associates,  Inc 
A  prototype  software  package  which  automates 
storage  of  metrics-related  requirements  quality 
and  process  data  and  which  progressively 
builds  up  requirements  quality  and  process 
metrics  has  been  built  for  use  with  Document 
Director  ReqMgr. 

Proprietary  tools  known  to  the  author  are 
also  being  utilized  in  a  similar  way  by  other 
organizations. 

11  Conclusions 

Numerous  best  practice  standards  (ISO 
9001,  Software  Engineering  Institute  criteria, 
MIL-STD-499B)  emphasize  a  closed  loop 
process  as  a  key  to  effective  technical  mana¬ 
gement.  The  metrics  described  in  this  paper  are 
a  means  of  implementing  closed  loop  control 
over  the  requirements  engineering  process. 

The  cost  of  implementing  these  metrics 
within  a  suitable,  existing  CASE  environment 
appears  to  be  around  two  percent  of  the  cost  of 
the  total  requirements  engineering  effort.  The 
engineering  manager  must  decide  whether  the 
resulting  payoff  will  exceed  this  cost  Sufficient 
data  to  conclusively  answer  this  question  has 
not  yet  been  developed  by  the  author,  nor  has  it 
been  identified  from  other  sources. 

Assessment  of  the  cost-effectiveness  of  the 
use  of  requirements  metrics  must  therefore,  for 
the  present,  be  subjective.  It  is  the  author's 
assessment  that  requirements  metrics, 
developed  on  a  sampling  basis,  used  within  a 
suitable  CASE  environment,  provide 
considerable  leverage  in  satisfying  the  goals  of 
complex  systems  development. 

Greatest  leverage  is  obtained  where 
sampling  techniques  are  used  in  metric 
development  Such  sampling  may  focus  on,  say, 
every  nth  requirement,  or  on  areas  of  perceived 
risk. 
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Abetreet 

The  following  p<^r  prawnts  a  unified  framework  for  rwaoning  about  tuning  conreetneu  on  aiUtrary 
aerially  reuaaUe  reaouroea.  The  propoaed  approach  bridgea  the  g^p  between  real'time  adieduling  theoiy 
and  ita  implanientation  on  pbyaical  raaoureea  via  ariieduUng  modale.  Wo  define  echednling  mod^  as 
abetreotions  that  can  be  used  to  reason  about  timing  correctness  on  physical  resources.  We  argue  that 
a  consistent  set  of  scheduling  models  for  all  shared  rssoureea  encapsulated  in  the  System  Engineering 
Woikbendi  (SEW)  enable  the  real-tinw  ayatama  architect  to  quiddy  erqdon  the  lystem  design  space, 
to  establish  and  maintain  a  firm  performanee  baseline,  to  tadlitate  system  raaouree  partitioiung  and 
management,  to  quantitatively  evaluate  hardwaiWsoihraaw  bouitdaiy  isaues,  to  optimise  aystem  config¬ 
uration  parameters,  and  to  errors  the  impact  of  new  techndogiee.  Further,  we  argue  that  sdieduling 
modds  operate  at  tte  right  level  of  abstraction  not  only  fcnr  aystem  perfiarmance  validation,  but  alao  for 
interfacing  with  the  Software  Development  organisation.  Sch^uling  models  of  CPU/Operating  Syatema, 
backplane  buses,  disk  subaystems,  and  locd  area  iMtworks  have  been  developed,  induding  the  following 
spec^modds:  Bed-Time  Mach,  MWaveOS,  Futurebuef,  Kficrochaimel  Ar^tecture,  FDDI,  and  IEEE 
802JS  token  rings. 


*lhis  leeBsrch  is  supported  in  part  by  grants  from  the  Office  of  Navd  Roaeaidi  and  the  Navd  Ooeu  Sytiems  Center  under 
contract  N00014-81-4-1804 


1  Introduction 


Devdoping  large,  complex,  distrilnited  ^ctems  is  a  trying  evolutionaiy  process  with  tedinical  and 

finanrial  diflittilties  for  both  the  contractor  and  the  customer.  Aggressive  developmmit  sdiedules,  coupled 
vnth  the  inherent  complexity  of  these  ^sterns,  forced  lystems  development  into  concurrent  engineering 
practices,  long  before  there  was  such  a  thing  as  concurrent  engineering.  Proijects  moved  from  conceptual 
design  to  detafled  design  to  unit  hardware/software  integration  and  test,  and  onto  systems  integration 
and  test  as  much  fay  programmatic  definition  as  by  tedmical  maturity  at  eadi  stage.  Sdiedule  pressures 
generaQy  result  in  inadequate  up-front  Systems  Engineering,  which  leads  to  incomplete  and  inadequate 
development  specifications  for  the  hardware  and  software  design/development  organizations.  As  the  qrs- 
tem  moves  towards  sell-off,  the  problems  inevitably  snowball.  The  development  process  often  degenerates 
into  an  interactive  fire-drill  between  the  customer.  Systems  Engineering  and  the  development  groiq>s. 

Given  this  preamble,  the  approach  advocated  in  this  paper  will  go  a  long  way  towards  alleviating  the  prob¬ 
lem  by  providing  methodolofiies  that  can  be  encapsulated  into  System  Engineering  friendly  tools.  These 
tools  will  enable  the  System  Engineer  to  quickly  explore  the  qrstems-level  design  space  in  a  quantita¬ 
tive  manner  and  answer  the  question,  "Will  the  astern  work?"  at  any  point  in  the  development  process. 
Further,  the  System  Engineering  Workbench  (SEW),  with  its  associated  analytical  framework,  directly 
supports  concurrent  engineering  fay  providing  the  appropriate  level  interface  to  the  software  and  hardware 
development  organizations.  The  Systems  Engineer  thus  works  directly  with  the  development  oigairiza- 
tions  reacting,  resolving,  and  validating  the  inevitable  changes,  while  simultaneously  being  the  resource 
management/allocation  watchdog. 


The  proposed  solution  strat^y  has  ffiree  core  components:  first  develop  a  consistent  set  of  sdreduling 
models  for  all  qrstem  resources,  second  develop  a  set  of  the  appropriate  figures  of  merit  (FOMs)  to  myrport 
quantitative  design  dedsions,  and  third  encapsulate  the  sdieduling  models  and  FOMs  into  a  user  friendly 
tool,  the  System  Engineering  Wbikbench.  We  do  not  discuss  the  figures  of  merit  in  the  paper.  Section  LI 
presents  a  generic  firamewoik  for  reasoning  about  timing  correctness  of  physical  resources  that  forms 
the  basis  for  the  scheduling  models.  Section  1.2  provides  an  overview  of  the  SEW  tool  and  Section  2 
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suxamarian  liow  to  design  real-time  qrstems  using  the  proposed  approach. 

1.1  Stdiedaling  Models 

SehudnUng  thooiy  holds  great  promise  as  a  means  to  a  priori  validate  the  timing  eorreetneu  of  real¬ 
time  applieationM.  However,  there  currently  exists  a  wide  gap  between  idealised  sdieduling  theoiy  and 
die  implementation  realities  of  building  large,  distributed  ^sterns  composed  of  real  processors  with  real 
operating  lystems  communicating  over  real  buses  and  real  netwoifcs  supported  by  real  disk  subsystems. 
Spedfically  we  propose  a  set  of  consistent  scheduling  models  whidi  accurately  model  the  timing  and 
concurrency  behavior  of  system  services  software  supported  by  the  underlying  system  hardware  resources. 

We  define  system  services  software  to  be  all  the  software  whidi  provides  the  infrastructure  iqion  whidi 
die  application  software  runs.  This  indudes:  operating  systems,  database  management  systems,  network 
management  systems,  user  interface/window  management  systems,  etc.  Eadi  of  these  system  services 
software  components  is  associated  with  managingits  associated  hardware  resource,  i.e.  central  processing 
tout  (CPU),  diAs,  networks,  displays,  etc.  Currently,  sdieduling  theory  assumes  an  idealized  resource 
with  zero  overhead  and  perfect  preemptability.  The  scheduling  models  reported  here  extend  sdieduling 
theory  to  address  the  overhead  and  limited  preemptability  costs  of  scheduling  application  software  via 
system  services  software  rurming  on  real  hardware  assets.  One  simply  cannot  answer  the  question,  "Vlfill 
the  system  work?”  without  correctly  addressing  the  overhead  and  limited  preemptability  issues. 

Consider  the  ideal  scheduling  equations  for  fixed  and  dynamic  priority  scheduling  summarized  in  Table  L 
The  fixed  priority  scheduling  equations  are  general  to  any  fixed  priority  assignment  The  dsmamic  priority 
utilization-based  equation  applies  for  either  earliest  deadline  or  least  sladc  time  algorithms.  The  dynamic 
time-based  formulation  applies  specifically  to  the  earliest  deadline  algorithm  [1].  A  similar  algorithm  for 
least  laxity  could  also  be  developed.  In  general  the  time-based,  exact  case  equations  provide  necessaiy  and 
sufficient  conditions  for  schedulability  whereas  the  utilization  based  equations  provide  less  ti^t  sufficient 
conditions  for  schedulability.  The  exception  is  the  100%  utilization  bound  for  idealized  earliest  deadline 
sdieduling  which  is  a  utilization-based  necessaiy  and  sufficient  sdiedulalnlity  bound. 

Note  that  Table  1  does  not  include  the  full  conditions  for  cheddng  schedulability  for  the  time-based,  dy* 
namic  priority  case  due  to  space  limitations.  The  cumulative  work  Wa  (t)  across  the  interval  defined  by 


Ideal  (rfme-Baaed)  ScJtednMng  Eqnatfawn 


Time 

Fixed 

V  «al,2,...,n  [^]  S  1 

Dynamie 

Ideal  (Utiliaatioii-Baaed)  ScJiednHng  Eqnationa 


Utilization 

Fixed 

Dynamic 

Table  1:  Ideal  Scheduling  Equationa  Summaiy 


ih*  bugr  pwiod  hngth.  Bit)  must  be  evahialed  to  dieek  that  each  job  of  each  tadc  aseets  ha  tUadHna. 
The  algorilhm  that  perCnnns  this  check  is  given  by: 


rind  tha  busy  period  of  the  task  set. 

For  each  task  in  the  task  set. 

For  each  job  In  the  busy  period. 

Find  the  eonpletlon  tlM  of  the  current  job 
If  the  cooipletlon  tlae  >  deadline,  test  fails,  exit 
Adjust  the  idle  tiae  to  the  deadline  of  the  next  job 
Mhile  the  current  tisM  <  the  arrival  of  the  next  job 
Increment  the  idle  tine  to  the  next  arrival 
Iterate  forward 


Ihs  dynamic,  tims-bassd  check  is  a  much  more  oompUeatsd  test  that  than  the  dynamic  utilisatiop-bassd 
chedc  and  will  yield  the  same  result  for  the  ideal  case.  It  is  indudsd  hers  for  completeness  and  as  a 
precursor  to  the  dynamic,  time*based  scheduling  models  presented  later. 

We  now  extend  the  equations  summarised  in  Table  1  to  indude  the  implementation  effects  of  oveihead  and 
bloddng.  The  extended  equations  summarised  in  Table  2  constitate  a  aet  of  generic  scheduling  modds 
that  win  be  used  to  reason  about  timing  correctness  on  the  various  resources.  Ihe  generic  sdmduling 
models  have  three  additional  components: 

•  Owerheadi  which  captures  the  task  depoident  scheduling  overhead  that  can  be  directly  bound  to  the 
application  task. 

•  Overhead^,  which  captures  the  task  independent  ^stem  level  overhead  encountered  on  some  re¬ 
sources. 

•  BJodtiugi  whidi  captures  the  time  task  n  is  delayed  executing  by  lower  priority  tasks  due  to  imper¬ 
fect  preemptability. 

The  (hterheadi  component  effectivdy  increases  the  run-time  Ct.  Other  than  increased  loading,  it  has  im> 
other  effects  on  the  schedulafatlity  of  the  tadc  set  The  O^head^,  compmient  often  diows  iq>  as  a  pe- 
rioffic  qystem  level  task  that  can  be  readily  incorporated  into  the  sdieduling  litamework.  The  Bleekimg^ 
conqxment  due  to  imperfect  preemption  degrades  the  fixed  priority,  time-based  formulathm  and  the  dy* 
namic  priority  uUlixatiai-based  formulations  fh>m  necessary  and  sufficient  schedulaldlity  conditions  to 
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Genoie  (T!im)  Sdiednliaf  Mod^ 


Qyiiaillie  ITaO)  ■X^«1  mm  {C^  *  OvcrAMd^)  ♦  * 

B(t)  ^,j  Cj  *  Overhead^  [^]  ♦  O^hmd^n  ♦  Biodtimgt 


Generic  (UtiliMtion)  Sdiednlinf  Modds 


Tdble  2:  Generic  SchedulinK  Model  Summaiy 
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toffident  idiedulability  eonditiont.  With  imperfect  preemptability  it  ie  no  kmger  necesseril^  true  that 
the  reiponae  time  of  the  first  job  of  a  task  in  critical  tone  phasing  is  always  the  longest  [2].  Thus  only 
&e  tame-based  dynamic  priority  formulation  remains  a  necessary  and  sufficient  omdition  for  sdiedula* 
bility  We  fiirther  note  that  the  time-base  approach  [1]  was  spedficalty  developed  to  address  the  impact  of 
overhead  and  blocking  in  dynamic  priority  scheduling  algorithms.  For  a  rigorous  proof  of  the  efficiency 
eonditioins  for  the  generic  scheduling  models  see  [3]. 

Hie  generic  sdieduling  models  summarised  in  Table  2  can  be  applied  to  deveUq>  sehedulalnlity  criterum 
for  concurrent  real-time  task  execution  of  arbitrary  tystem  resources.  Table  3  summarises  the  dominant 
effects  that  contribute  to  Overluadi,  Overluad,^t  and  Bloekingt  for  CPlTs/OS's,  Buses,  Disks  and  Local 
Area  Networks.  Note  that  the  scheduler’s  fiinctionality  shows  up  in  very  different  fomu  on  the  various 
resources.  Operating  tystems  schedule  CPUs.  Buses  are  scheduled  using  hardware  implemented  arbitra- 
\  tion  protocols.  Disks  are  scheduled  with  a  combination  of  hardware  and  software  in  their  controllers,  and 

LANs  are  sdieduled  via  their  Media  Access  Control  (MAC)  protocols.  Scheduler/arbiter  implementations 
range  from  fully  software  implemented  for  operating  systems  to  full  hardware  implementations  on  buses 
and  networks. 

The  OvtrJuadi  component  for  CPU  scheduling  is  composed  mainly  of  the  scheduler  overhead  along  with 
associated  Interrupt  Service  Soutines  (ISRs),  synduonization  protocols,  device  drivers,  etc.  The  bus 
Overhtadi  component  tends  to  be  dominated  by  the  time  for  the  arbitration  lines  to  settle  along  with 
addresfing  overhead  and  miscellaneous  control  functions.  For  disks,  the  Overheadi  component  is  dom¬ 
inated  by  the  physical  movement  of  the  head,  and  to  a  lesser  extent  the  disk  controller  overhead.  On 
LANs,  Overheadi  tends  to  be  dominated  by  propagation  delitys  along  with  addressing  and  miscellaneous 
control  functions. 

Tbe  Overheadte*  component  shows  up  on  all  the  resources  in  different  forms  as  well.  Operating  tystem 
event  handling  and  scheduling  is  typically  done  at  discrete  intervals  defined  by  an  underlying  periodic 
timer  interrupt  This  system  overhead  component  is  not  bound  to  specific  tasks  and  may  be  readily  mod¬ 
eled  as  one  or  more  additional  system  level  task(s),  with  a  run-time  of  Overhead^,  and  a  period  T,,,. 
Badq>lane  buses  often  must  support  refresh  of  dynamic  RAM  memory  components,  ^le  overhead  asso- 
dated  with  this  function  is  readily  modeled  as  a  periodic  Overhead,f,  component  Similarly  disks  have 
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Subiiystem 

Sdieduler/Arfaiter 

Ontrhoodj 

Onorkomd^t 

BtochiM0i 

CPUA>S 

8/W  Protocol: 
Flexible.  RMS 

EDS,  etc. 

Sdieduler,  ISR,  Synch 
Device  Drivers,  etc. 

Time-Tk 

Tune-Tie 

Kemel-Funcs. 

Buses 

HAF  Protocol  JtR 
Pos.  priority 

Msg.  based  priority 

ArbitratiMi,  Addr. 

8yae,  Mice  Control 

DRAM 

refresh 

fCmax.  tenure) 

Disks 

S/WProtocd 

RMS.EDS.Pscan 

Tsean,ete. 

Head  Movemoit 
Dominated,  control  S/W 

Scan 

overhead 

Rmaz.  tenure) 

LANs 

H/W  Protocol 

RR,  Fixed  priority 

Prop.  Delay,  Addr.  ft 
Misc.  control 

TTRTtask 

etc. 

f(max.pktj>rop.delay) 

Table  3:  Oveibead  and  blocking  for  various  sub-^tem  eomponents 


an  Overhead,^,  component  due  to  the  overhead  from  scanning.  Although  LANs  typically  do  not  have  an 
Overhead^,  component,  FDDI  is  an  exception.  Assuming  prioritized  scheduling  at  each  FDDI  node  and 
the  use  of  the  synchronous  mode  protocol,  the  scheduling  affects  of  the  other  nodes  can  be  modeled  as  an 
Owerheadtf,  component 

The  Bloekingi  component  arises  on  eadi  of  the  resources  due  to  imperfect  preemptability.  On  CPUs  the 
Bloekmgi  compcment  is  generally  closely  tied  to  the  timer  interrupt  rate  that  drives  the  scheduler.  On 
buses,  the  Bloekingi  component  is  a  function  of  the  maximum  transaction  time  on  the  buses.  Similarly, 
on  networks,  Bloekingi  is  a  function  of  the  maximum  packet  size  as  well  a  propagation  affects.  Disks  are 
generally  nonpreemptable.  Thus  Bloekingi  is  generally  a  function  of  the  maximum  sinfde  transaction  size 
permitted. 

Hie  previous  section  introduced  generic  scheduling  models  as  a  means  for  bridging  the  gap  between  ide¬ 
alized  sdieduling  theory  and  the  implementation  realities  associated  sdieduling  real  processes  on  real 
resources.  Two  new  terms  were  introduced  to  account  for  implementation  overheads:  Overheadi  to  ac- 
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count  for  ovorfaead  components  that  can  be  bound  to  each  individual  task,  nt  and  Overhead.,,  to  account 
for  qrstem  overhead  components  that  cannot  be  directly  bound  to  individual  task  acecutions.  A  Bloekit^n 
term  was  introduced  to  account  for  the  time  that  task  n  is  delayed  execution  due  to  imperfect  resource 
preemptabUity.  We  then  summarized  the  dominant  components  of  these  oveihead  and  blocking  terms  for 
CPUs,  buses,  disks  and  LANs. 

1.2  ^vtem  Engineering  Workbench 

Designing  an  effective  tool  at  the  system  level  is  a  difficult  task.  We  are  currently  building  the  System 
En^eering  Woikbench  to  encapsulate  the  models,  methodology  and  figures  of  merit.  The  following  para¬ 
graphs  summarize  the  design  of  the  SEW  tool. 

The  SEW  tool  is  an  X-V^dow  based  ^stem,  with  a  user  friendly  interface  that  will  allow  the  Systems 
En^eer  to  interactively  reason  about  real-time  systems  design  via  the  complex  theoretical  models.  The 
philosophy  of  the  tool  is  provide  an  interface  that  will  allow  the  engineer  to  easily  add  new  models  within 
a  consistent  framework,  to  modify  i^stem  parameters  and  quickfy  view  the  results,  and  experiment  with 
lystem  design  ideas  and  verify  that  the  timing  correctness  of  the  system  will  hold. 

SEW  consists  of  the  following  components:  task  set  editor,  subsystem  and  system  editors,  sub^stem  and 
system  simulators,  and  the  scheduling  analyzer.  The  following  subtasks  describe  the  function  of  each, 
and  the  process  that  we  will  go  through  to  develop  eadi.  Note  that  an  X-l^dow  prototype  is  currently 
being  developed,  and  that  the  bulk  of  the  analytical  software  that  operates  behind  the  user  interface  is 
already  in  place.  A  hi|h  l^vel  view  of  SEW  is  provided  in  Figure  1.  The  following  paragraphs  provide  a 
brief  summary  of  the  form  and  function  of  each  of  the  msjor  components  of  SEW. 

•  Task  Set  Editon  This  component  of  SEW  provides  a  window  for  editing  the  application  task  de¬ 
scription.  This  window  will  allow  the  user  to  graphically  draw  interconnected  bubbles  representing 
tasks  and  their  interactions.  Each  task  can  also  be  individually  edited,  allowing  the  user  to  specify 
task  resource  requirements,  including  timing,  communications,  etc.  Eventually,  this  window  will  be 
hierarchical  in  nature,  allowing  common  tasks  to  be  grouped  together. 

•  Sub^stem  and  System  Editors:  This  component  of  SEW  provides  a  common  window  interface 
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Figure  1:  Systems  Engineering  Workbench  Overview 

for  creating  and  modifying  models  of  the  various  subsystems  and  the  complete  real*time  system. 
The  various  subsystems  will  be  edited  here,  allowing  the  user  to  diange  system  parameters,  such 
as  blocking  and  overhead  measurements.  The  system  editor  will  specify  the  physical  interconnects 
between  the  various  subsystems. 

•  Subsystem  and  System  Simulators:  An  important  part  of  the  SE!W  tool  is  the  ability  to  view 
the  expected  task  interactions.  This  window  will  allow  the  user  to  take  the  task  set  that  has  been 
created  and  simulate  its  execution  on  either  a  particular  subsystem  or  a  combination  of  subsystems 
into  a  system.  This  simulation  is  done  at  a  hi|h  level,  so  that  the  user  can  see  how  the  tasks  interact 

•  Scheduling  Ansdyzer:  This  final  window  is  perhaps  the  most  important;  through  it  the  user  can 
test  the  schedulability  of  the  created  task  set  running  on  the  given  subqrstem  or  system.  This 
window  will  allow  the  user  to  test  the  schedulability  under  any  fixed  or  dynamic  priority  scheduling 
criterion.  The  scheduling  analyxer  will  also  allow  the  user  to  experiment  with  the  system  design 
space,  by  vaiying  system  parameters  and  viewing  the  resultant  effect  against  quantitative  figures 
of  merit.  Through  this  window,  the  user  can  view  common  reahtime  system  metrics  for  the  task 
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Sub4System  Type 

Models  Available 

Operating  Sfystems 

BT  Mach  (68030,  R3000,  i960) 
MWaveOS 

Chimera  n 

Busses 

Futurd>us4- 

MicroChannel 

Disks 

Generic  Model 

Networks 

IEEE  802.5  Token  Ring 
FDDI 

IEEE  802.6  DQDB 
ATMSwitdies 

Table  4:  Scheduling  Models  Summaiy 

set,  such  as  utilization,  breakdown  utilization,  server  capacity,  sladc  time,  as  well  as  find  optimal 
operating  points  for  such  things  as  operating  ^stem  timer  interrupt  rate  or  minimum  packet  size 
on  a  netwoik. 

The  SEW  tool  is  currently  being  developed  and  is  reported  here  to  emphasize  the  need  for  appropriate  tools 
to  support  the  real-time  systems  engineer.  This  tool  supported  by  the  underlying  scheduling  model  and 
FOMs  can  greatly  can  help  move  real-time  systems  engineering  from  a  practice  to  a  sound  engineering 
disc^line.  In  table  4  we  summarize  the  scheduling  models  that  have  been  developed  to  date. 

2  Engineering  Real-Time  Systems  via  Scheduling  Models 

This  paper  presented  a  unified  framework  for  reasoning  about  timing  correctness  on  arbitraiy,  serially 
reusable  resources.  The  proposed  approach  bridges  the  gap  between  real-time  scheduling  theory  and  its 
implementation  on  physical  resources  via  scheduling  models.  We  defined  scheduling  models  as  abstrac- 
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tions  that  ean  be  used  to  reason  about  timing  correctness  on  physical  resourcea  We  argimd  that  a  consis¬ 
tent  set  of  sdieduling  models  for  all  shared  resources  together  enth  relevant  figures  oi  merit  «ica|»ulated 
in  the  Systems  Engineering  Workbench  (SEW)  enable  the  systems  ardiiteet  to: 

•  Validate  syatem  level  response  time  performance.  This  core  capalnlity  of  the  scheduling  models 
fiiraitatjtM  all  the  capabilities  listed  below.  Given  a  set  of  task  run-times,  and  synchronization 
requirements,  the  scheduling  models  allow  the  Systems  Engineer  to  determine  whether  all  response 
time  requirements  will  be  met 

•  Quiddy  mq>lore  the  system-level  design  space,  and  establish  a  firm  baseline.  Given  a  con¬ 
sistent  set  of  scheduling  models  of  the  system's  components  (CPUs,  buses,  networks,  disks,  etc.), 
the  systems  architect  can  then  quickly  evaluate  the  viability  of  aiiutraiy  system  configurations.  We 
provided  examples  of  that  showed  how  the  scheduling  models  could  be  used  to  decide  which  bus 
arbitration  scheme  or  LAN  was  most  appropriate  for  a  specific  application. 

•  Expose  the  system  ovezbead  costs.  The  scheduling  models  directly  expose  the  system  overhead 
costs  assodaied  with  supporting  various  application  functions.  By  exposing  the  costs  of  all  system 
functionalities,  the  system  ardiiteet  can  then  make  sound  engineering  dedsions  as  to  whether  he 
can  afford  the  additional  functionality  for  his  application. 

•  Facilitate  system  resource  partitioning  and  management.  The  scheduling  models  provide  the 
right  level  of  abstraction  for  the  System’s  Engineering  organization  to  interface  with  the  Software 
Development  organization.  Initially  resource  budgets  can  be  assigned  to  the  individual  application 
programmers  which  specify  limits  on  CPU  time,  synchronization  requirements,  device  I/O  etc.  The 
Systems  Enpneering  organization  validates  the  system’s  performance  relative  to  these  initial  bud¬ 
gets.  As  the  development  cycle  proceeds  and  actual  resource  requirements  become  available,  the 
System  En^eering  organization  can  then  diedc  to  make  sure  that  the  system  level  timing  perfor¬ 
mance  is  maintained.  In  this  way  the  Systems  and  Software  Development  organizations  can  work 
together  and  have  confidence  at  any  point  in  time  that  the  current  baseline  system  is  feasible. 

•  Quantitatively  evaluate  hardware/software  boundary  issues.  Systems  Engineers  for  high 
performance  embedded  systems  are  often  required  to  evaluate  the  costiperformance  tradeoffs  as¬ 
sociated  with  special  purpose  hardware.  Mraz  [4]  used  a  set  of  scheduling  models  as  an  initial 
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evaluation  vehicle  in  his  work  developing  a  RISC-Based  architecture  for  real-time  computation.  He 
developed  sdieduling  models  for  a  conventional  CPU/OS  pair,  a  processor  supported  by  an  operating 
system  coprocessor,  and  dual  threaded  RISC  processor  that  injects  the  operating  system  as  a  non¬ 
interfering  execution  stream  in  application  pipeline  stalls.  Using  these  three  models  he  was  able  to 
quickly  quantii^  the  expected  gains  associated  with  the  differing  approaches. 

•  E^ore  the  impact  of  new  technologies.  The  science  ofbuilding  scheduling  models  for  arbitraiy 
technologies  is  maturing.  Thus,  we  expect  to  be  able  to  fairly  quickly  develop  consistent  scheduling 
models  for  any  emerging  technologies.  For  example,  there  is  currently  a  lot  of  work  in  packet  switch¬ 
ing  netwoiks.  Although,  we  have  focused  most  of  our  effort  on  shared  media  LANs,  we  believe  that 
we  can  readily  model  most  packet  switching  netwoiks. 

•  Optimize  system  configuration  parameters.  One  of  the  most  important  capabilities  that  schedul¬ 
ing  models  provide  is  that  they  allow  the  System  or  Software  Engineers  to  optimize  the  system 
configuration  parameters. 

The  above  work  represents  a  solid  first  step  to  moving  real-time  systems  development  from  an  art  to  an 
engineering  science.  However,  there  is  still  much  work  to  be  done,  sudi  as  folding  in  the  work  associated 
with  jointly  scheduling  aperiodic  and  periodic  tasks  [5,  6].  The  System  Engineering  Workbench  needs 
to  be  completed.  The  capability  to  address  fault  tolerant  requirements  end  provide  graceful  degradation 
properties  also  needs  to  be  incorporated. 
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This  paper  examines  the  structure  and  dynamics  of  a  large,  complex 
information  system  that  has  been  in  operational  use  for  more  than  a  decade.  The 
paper  begins  by  explaining  how  the  study  of  this  system  is  related  to  the 
development  of  complex  systems.  It  then  introduces  the  modeling  philosophy 
used  in  the  system’s  design,  provides  an  overview  of  the  system’s  architecture,  and 
evaluates  13  years  of  experience  with  its  development  and  maintenance.  It  is 
shown  that,  even  though  the  organization  of  the  system  defies  all  current 
standards  of  good  practice,  the  resulting  system  is  reliable,  maintainable,  useable, 
and  inexpensive  to  support  The  paper  closes  with  some  questions  about  the  true 
nature  of  the  conduct  of  systems  engineering  for  complex  systems. 


Introduction 

For  the  past  half-dozen  years,  my  research  has  focused  on  the  nature  of  the  software 
process  and  how  we  may  best  implement  and  control  it  To  me,  software  engineering  (which  is 
the  study  of  the  conduct  of  the  software  process)  is  a  form  of  systems  engineering.  Moreover, 
with  our  growing  reliance  on  software  tools  for  design,  test,  and  manufacture,  software 
engineering  is  coming  to  provide  an  excellent  model  for  equipment  fabrication.  Thus,  even 
though  the  topic  of  this  meeting  is  "complex  system  engineering,"  it  is  reasonable  to  accept  a 
large  software  system  as  a  specialized  instance  from  the  target  class  in  the  expectation  that  it  will 
provide  insight  into  the  role  of  automation  in  the  development  of  the  large,  complex,  fault 
tolerant,  distributed,  real-time,  time-critical  systems  of  the  future. 

This  paper  examines  the  13-year  history  of  a  large,  complex,  distributed,  clinical 
information  system  [1].  The  system  is  of  special  interest  because  it  was  developed  and 
maintained  using  an  environment  that  hides  most  of  the  implementation  details  [2].  The  result 
is  an  architecture  that  is  very  different  from  those  employed  in  typical  information  systems.  The 
^tem  also  is  of  interest  in  this  conference  because  it  is  used  to  make  life-threatening  decisions, 
it  must  be  fault  tolerant,  and  its  data  must  be  timely.  Of  course,  the  technology  used  for  an 
information  system  make  these  last  properties  much  easier  to  implement  than  they  would  be  for 
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a  Navy  tactical  ^tem.  Thus,  this  paper  does  not  suggest  that  experience  in  a  medical  setting 
is  directly  transferrable  to  a  military  application.  Rather,  the  paper  argues  that  if  characteristics 
of  this  clinical  information  system  seem  to  violate  conventional  wisdom,  then  perhaps  our 
perceptions  regarding  those  characteristics  were  poorly  founded.  It  would  follow  that,  if  we  must 
adjust  our  understanding  of  the  structure  of  information  systems,  then  perhaps  we  also  should 
alter  our  conventions  regarding  complex,  real-time  tactical  systems. 


On  Integration  and  Modularization 

Elsewhere  I  have  argued  that  software  development  is  a  modeling  activity  [3].  The 
models  range  in  granularity  from  how  a  computer-supported  product  can  resolve  some  domain 
need  (i.e.,  the  requirements)  to  how  the  computations  should  be  carried  out  (i.e.,  the  source 
code).  An  information  system  design  is  based  on  a  model  of  some  external  reality.  Surrogates 
for  selected  entities  in  tha*  reality  will  be  maintained  as  data  in  the  system’s  database.  Because 
the  external  reality  is  deeply-integrated,  the  database  should  reflect  its  integrated  character.  In 
many  cases,  however,  the  difficulty  in  implementing  a  system  that  operates  within  a  complex 
environment  forces  the  designers  to  abstract  away  many  of  the  important  properties  (and 
complexities)  of  the  external  environment.  This  reduces  risk  and  improves  the  processing  of 
selected  functions.  Unfortunately,  the  result  is  a  distortion  of  the  design  model,  which  restricts 
it  to  a  segment  of  the  real  world  model.  As  a  consequence,  it  is  difflcult  to  extend  system- 
supported  activities  beyond  the  boundary  imposed  by  the  constrained  system  model. 

For  example,  a  clinical  system  may  ignore  all  attributes  of  a  patient  other  than  the 
patient’s  schedule.  The  resulting  patient  scheduling  system  will  use  some  common  patient 
identifier  so  that  it  may  link  patient  schedule  activities  with,  say,  billing  activities.  Furthermore, 
an  integrated  patient  scheduling  system  will  associate  providers  (e.g.,  doctors,  nurses)  with  the 
patient  visits,  and  they  may  even  maintain  individual  provider  schedules  including  normal  office 
hours,  holidays,  and  other  absences.  Yet,  even  with  a  highly  complex  appointment  system,  the 
basic  activity  being  modeled  is  that  of  scheduling;  there  will  be  little  about  the  system  that  is 
unique  to  a  health  care  setting.  In  fact,  the  basic  system  would  be  equally  useful  for  a  beauty 
salon,  if  only  the  cost  per  visit  were  high  enough  to  justify  the  investment. 

The  Oncology  Qinical  Information  System  (OCIS)  was  designed  to  assist  the  patient- 
oriented  activities  in  the  Johns  Hopkins  Oncology  Center.  The  model  used  to  design  OCIS  is 
derived  from  a  model  of  the  Center.  Patients  are  treated  as  inpatients  and  outpatients;  services 
are  provided  by  physicians,  nurses,  laboratories,  the  pharmacy,  and  registrars;  therapy  is 
organized  by  protocol  (both  research  and  standard  practice),  and  the  therapy  plans,  status,  and 
analysis  must  be  documented;  visits,  admissions,  and  resources  are  scheduled;  and  clinical  data 
are  combined  and  displayed  to  aid  decision  making.  This  model  is  highly  integrated.  For 
instance,  the  patient  record  must  contain  a  history  of  all  clinical  results,  diagnoses,  and  scheduled 
activities;  nursing  information  must  indicate  patient  activiti«,  individual  schedules,  projected 
staffing  needs,  and  resource  assignments.  In  fact,  the  complexity  of  the  model  precludes  its 
complete  definition.  The  model  evolves  as  experience  with  OCIS  builds;  moreover,  as  the  health 
care  team  becomes  familiar  with  OCIS,  use  of  the  system  alters  the  real  world  model  of  Center 
operations. 
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I  believe  that  deep  integration  and  evolving  understanding  are  characteristics  of  all  large- 
scale,  complex  systems.  When  the  complexity  exceeds  our  ability  to  manage  it,  or  when  our 
understanding  does  not  permit  us  to  approach  the  complete  problem,  we  decompose  the  problem 
by  breaking  it  down  into  smaller,  more  manageable  components.  The  principle  of  information 
hiding  is  used  to  abstract  away  details  external  to  the  component  of  interest.  As  experience 
accumulates,  new  uses  for  the  modularized  components  are  recognized  and  new  decompositions 
are  experimented  with.  This,  of  course,  is  how  hardware  has  evolved.  Simon  suggests  that  this 
hierarchical  approach  is  the  essence  of  the  design  process  [4].  Yet  the  jumble  of  the  previous 
paragraph  suggests  no  such  structure.  Everything  seems  to  be  connected  to  everything  else; 
there  is  no  natural  order.  Imposing  an  order  by  decomposition  would  introduce  interfaces  that 
serve  to  disintegrate. 

The  question  is,  can  we  develop  a  highly  integrated  design  that  evolves  as  our 
understandinf  uilds,  and — if  yes— can  we  implement  and  maintain  that  design?  This  paper  asserts 
that  the  OCIS.  project  experience  answers  both  those  questions  in  the  affirmative.  How  this  has 
been  accomplished  has  been  documented  elsewhere.  >^at  is  important  here  is  to  recognize  that 
it  can  be  done.  Just  as  Bannister  opened  the  way  to  a  series  of  sub-four-minute  mile  records, 
I  would  hope  that  the  OCIS  example  would  motivate  others  to  examine  this  approach  and 
exceed  our  accomplishments. 


Overview  of  the  System 

In  its  present  configuration,  OCIS  operates  on  a  network  of  S  computers,  maintains  a 
distributed  database,  and  supports  250  terminals  located  throughout  the  Center.  The  database 
provides  online  access  to  some  half-million  days  of  care  for  the  20,000  cancer  patients  treated 
at  the  Center.  The  programs  comprise  a  million  lines  of  code,  but  this  code  is  generated  from 
compact  specifications.  In  early  1992,  the  OCIS  design  consisted  of  specifications  for  9257 
programs  and  a  data  model  comprised  of  2273  relations  (called  tables)  and  3823  attributes 
(called  elements).  The  program  specifleations  are  very  compact,  averaging  15  lines  in  length. 
Tlie  functionality  delivered  by  a  program  specification  is  roughly  the  same  as  that  provided  by 
a  300-line  program  in  a  more  verbose  language  such  as  COBOL.  Examples  of  program  functions 
are  the  management  of  a  menu  (together  with  help  messages  and  error  checks),  the  processing 
of  a  request  for  a  report,  the  listing  of  a  report,  and  a  reasonably  complex  computation  involving 
data  retrievals.  OCIS  was  first  installed  in  1976;  this  version  of  the  system  was  reengineered 
from  the  original  system  starting  in  1980,  and  it  has  been  in  continuous  operation  since  1983. 
Thus,  the  current  implementation  of  OCIS  is  a  large  and  complex  information  system  that  has 
been  used  for  a  decade  in  supporting  life-threatening  decisions.  The  remainder  of  this  paper 
examines  the  structure  of  that  system  and  how  the  structure  has  evolved. 

Unlike  the  organization  of  a  system  whose  design  was  guided  by  decomposition,  the 
structure  of  OCIS  is  best  characterized  as  holistic.  The  specification  is  maintained  in  an 
integrated  database  (called  the  application  database^  or  ADB)  in  which  the  elementary  items  are 
stored  as  fragments  [5].  A  program  generator  operates  on  these  fragments  to  produce  the 
executable  programs.  (A  program  specification  may  cause  the  generation  of  more  than  one 
executable  program.)  Examples  of  higher  level  fragments  include  program  specifications,  relation 
(table)  definitions,  and  data  structures  composed  of  multiple  tables.  Within  the  ADB  there  is 
no  concept  of  either  module  or  file.  A  program  specification  may  reference  tables  and  types 
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(e.g^  attributes  or  elements)  without  explicitly  "including”  them.  In  other  words,  all  knowiedee 
within  the  ADB  is  shared  by  all  componenu  of  the  ADB.  The  program  generator  organizes  the 
ADB  contents  into  modules  for  execution. 

By  way  of  contrast,  a  procedure-oriented  implementation  builds  modules  that  define  the 
procedure,  and  it  uses  data  structures  to  support  the  processing.  The  process  has  no  access  to 
knowledge  not  explicitly  specified  or  included  within  the  module.  An  object  orientatitMi  uses  a 
different  modularization  technique  in  which  the  module  is  organized  around  the  dau  type; 
operations  on  the  dau  type  are  appended  explicitly,  and  data  type  attributes  in  the  inheriunce 
chain  are  included  implicitly.  In  both  these  module  implemenutions,  the  module  serves  to  hide 
details,  and  one  of  the  benefits  of  the  module  formalism  is  its  application  independence,  which 
is  seen  to  foster  reuse.  >Mth  the  environment  used  to  generate  the  OCIS  programs,  on  the 
other  hand,  modularization  is  viewed  as  an  implementation  (and  not  a  design)  concern.  Thus 
knowledge  of  how  to  transform  a  design  into  an  implementation  is  reused,  and  the  system  design 
(Le.,  the  ADB)  is  organized  as  a  single,  coherent  unit.  (The  ADB  can  be  organized  into  families 
of  applications,  and  segments  of  applications  may  be  copied  [6].) 

How  well  does  this  approach  work?  The  design  of  the  present  version  of  CXTS  began 
in  1980.  Since  that  time,  productivity  has  averaged  one  production  program  per  effort  day.  In 
a  1992  analysis,  the  net  productivity  was  .7  production  programs  per  effort  day  (computed  by 
dividing  the  number  of  specifications  in  the  production  system  by  the  number  of  effort  days 
expended  on  the  project  since  1980).  Ail  maintenance  and  new  development  is  the  responsibility 
of  a  staff  of  six.  During  the  past  S  years,  that  staff  has  added  new  functionality  to  OCTS  at  the 
rate  of  20  programs  per  week.  The  number  of  program  specifications  in  the  design  has  increased 
by  a  factor  of  2JS  since  the  system  was  insuiled  in  1983.  Most  of  these  programs  represent 
functions  not  available  in  the  initial  implementation;  some  of  the  new  functions  augment  or 
replace  system  features  that  were  poorly  understood  at  the  time  of  original  insullation.  Thus, 
the  design  of  OCIS  has  evolved  as  the  users*  understanding  of  its  operation  and  potential 
matured  [7]. 

The  claim  just  has  been  made  that  the  design  of  OCIS  reached  its  current  state  in 
response  to  a  changing  understanding  of  its  objectives.  That  is,  as  the  model  of  the  real  world 
adjusted  itself  to  take  advantage  of  the  facilities  provided  by  OCIS,  the  design  model  of  OCIS 
reacted  to  accommodate  the  modified  needs.  Because  this  operational  experience  took  place 
over  a  ten  year  period,  one  would  expect  to  find  that  changes  would  take  one  of  two  forms: 

The  addition  of  isolated,  modularized  features.  Here  a  new  requirement  is  identified, 
and  a  set  of  programs  is  specified  to  meet  this  requirement  The  new  function  is 
relatively  independent  of  the  remainder  of  OQS;  exchange  is  managed  through  well- 
defined  interfaces.  Most  extensions  to  traditional  architectures  are  of  this  type. 

The  editing  of  existing  features.  In  this  case,  new  requirements  are  met  by  altering  the 
existing  functionality  and  adding  new  programs  as  necessary.  The  resulting  design  cannot 
differentiate  between  what  previously  existed  and  vhat  was  just  added.  The  old  and  new 
are  deepfy  integrated,  thereby  reflecting  the  structure  of  the  external  reality  served  by 
the  system. 

I  now  demonstrate  that  the  growth  of  OCIS  is  an  example  of  the  second  form. 
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Dcmoastration  of  Integration 

Ikble  1  contains  a  matrix  that  counts  the  number  of  program  speciHcations  by  the  year 
of  their  initial  definition  and  last  editing.  If  the  system  additions  were  highly  modular,  then  one 
would  expect  to  find  few  program  speciHcations  edited  after  the  year  of  their  initial  definition. 
Yet  the  table  presents  a  very  difierent  pattern.  In  the  13  month  period  from  January  1.  1991 
to  February  2,  1992  (the  date  the  data  were  collected),  a  total  of  4,094  programs  were  brought 
into  the  editor.  Of  these,  1,049  programs  were  new  programs  defined  during  that  period  and 
the  remaining  3,045  programs  were  edited  to  accommodate  the  new  programs.  That  is,  in  a  13 
month  period  44%  of  programs  in  the  9,257-program  system  had  been  either  edited  or  added 
by  a  staff  of  six  full-time  equivalents.  (In  an  1988  study,  1,084  programs  were  added  and  2,170 
programs  were  edited  during  an  18  month  period,  which  meant  that  49%  of  a  6,605-program 
system  had  been  altered.)  By  way  of  a  side  comment,  44%  percent  of  the  program  edits  were 
made  by  someone  other  than  the  original  designer,  and  the  median  number  of  program  edits 
over  a  13  year  period  was  11.  Thus,  the  design  team  did  not  experience  undue  difficulty  in 
working  within  this  degree  of  integration. 


Year  Year  of  last  update 


Defined  To  1982 

1983 

1984 

1985 

1986 

1987 

1988 

1989 

1990 

1991 

1992 

Ibtal 

To  1982 

454 

173 

111 

68 

123 

81 

157 

118 

132 

416 

171 

2,004 

1983 

204 

156 

48 

63 

58 

64 

41 

75 

188 

63 

960 

1984 

323 

70 

69 

90 

43 

44 

76 

235 

47 

997 

1985 

115 

97 

48 

39 

34 

65 

259 

52 

709 

1986 

250 

131 

103 

61 

100 

215 

57 

917 

1987 

186 

87 

72 

83 

201 

73 

702 

1988 

112 

30 

74 

140 

51 

407 

1989 

208 

156 

288 

86 

738 

1990 

271 

363 

140 

774 

1991 

643 

254 

897 

1992 

152 

152 

Total 

454 

377 

590 

301 

602 

594 

605 

608 

1,032 

2,948 

1,146 

9,257 

Ihble  1.  Distribution  of  OCXS  programs  by  year  defined  and  year  last  updated. 

Table  2  provides  another  view  of  the  integration  of  OCIS.  It  summarizes  the  relative 
ages  between  a  program  and  the  programs  it  calls,  the  tables  it  reads,  and  the  tables  it  writes. 
As  can  be  seen  in  the  table,  about  20%  of  all  references  to  a  program  or  table  are  to  items  that 
were  defined  at  least  six  months  after  the  referencing  program;  moreover,  half  of  those 
references  are  to  items  defined  2  or  more  years  after  the  referencing  program’s  initial  definition. 
Thus,  older  programs  are  edited  (e.g.,  a  menu  program  is  modified  to  provide  access  to  a  new 
feature),  and  new  capabilities  are  integrated  with  those  that  already  exist 

Figures  1-3  depict  the  same  data  in  a  historical  context  The  relative  ages  of  the 
referenced  items  are  grouped  by  the  year  that  the  referencing  program  first  was  defined.  The 
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Number 


Years 


Months 


Yean 


2  < 

2-1 

1-V4 

d-i 

±1 

1-6 

V4-1 

1-2 

>  2 

Items 

Called  Program 

14 

5 

5 

7 

39 

8 

5 

6 

10 

14,285 

Read  Ihble 

29 

9 

8 

8 

20 

6 

4 

4 

12 

13,896 

^Written  Ihble 

14 

8 

7 

9 

33 

7 

5 

4 

13 

5,919 

Ibble  2.  Relative  ages  of  associated  OdS  items  as  percent  of  totaL 
(Date  of  program  definition  relative  to  the  date  of  referenced  item  definition). 

items  are  organized  into  five  categories:  those  defined  at  approximately  the  Mme  time  (±  6 
months^  those  between  6  months  and  2  yean  (before  and  after),  and  those  more  than  2  yean 
(before  and  after).  References  to  items  defined  before  the  program  may  be  thought  of  as  a  kind 
of  reuse  of  existing  ^tem  resources;  references  to  items  defined  after  the  program  was  defined 
represent  retrofits  for  integration.  Ihe  time  period  covered  may  be  divided  into  three  activities: 
1980-1983,  development  of  OQS;  1984-1%6,  evolution  with  constrained  resources  during  user 
orientation;  and  1987-,  mature  evolution.  Naturally,  there  were  no  older  items  to  be  referenced 
during  the  fint  two  yean  of  development,  and  there  can  be  few  retrofits  for  the  newer  programs. 
The  data  provide  tte  following  insights  into  the  structure  of  OCIS  and  its  evolution. 

Relative  ages  of  programs  called  (Figure  1).  Although  most  program  calls  are  to  programs 
defined  at  about  the  same  time,  a  relatively  large  number  of  programs  in  the  original 
OCIS  system  (1983  and  before)  were  updated  to  reference  programs  defined  significantly 
later.  In  fact,  approximately  20%  of  the  calls  in  the  origin^  system  are  to  programs 
written  at  least  2  yean  after  the  calling  programs.  Once  OCIS  is  in  production,  there 
tends  to  be  significant  use  of  **utiiity**  resources  that  are  part  of  the  existing  baseline 
(e.g.,  the  calls  to  programs  defined  two  or  more  yean  earlier).  Nevertheless,  in  the 
mature  period,  the  use  of  older  utilities  is  almost  balanced  by  the  retrofitting  of  existing 
programs. 

Relative  ages  of  tables  read  (Figure  2).  As  one  might  aq)ect,  the  dau  show  that  there  is 
a  strong  tendency  for  new  programs  to  read  existing  tables.  The  database  is  an  OCIS 
resource,  and  new  programs  ought  to  combine  existing  dau  with  the  dau  defined  for  the 
new  features.  (For  example,  new  patient-oriented  functions  will  reference  the  patient 
name,  which  was  defined  as  part  of  the  initial  OCIS  increment)  It  a  not  intuitive^  obvi¬ 
ous,  however,  that  the  definition  of  new  dau  structures  will  affect  older  programs.  Yet, 
the  dau  for  the  development  period  show  that  many  of  the  baseline  programs  were 
modified  to  read  dau  defined  considerably  after  the  program  (e.g.,  23%  of  the  1983 
system’s  reads  were  defined  2  or  more  years  after  the  program).  This  retre^tting  of 
programs  to  read  newly  defined  dau  continues  at  a  lesser  degree  throughout  the  life  of 
the  ^tem. 

Relative  ages  of  table  written  to  (Figure  3),  In  a  highfy  compartmentalized  system,  one 
would  find  that  the  dates  of  deflation  for  the  Ubles  and  the  programs  that  write  to  (Le., 
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Hgure  1.  Relative  age  of  programs  called  by  date  of  calling  program  definition. 
Number  of  reads  by  relative  age 


1980  1981  1982  1983  1984  1988  1988  1987  1988  1989  1990  1991  1992 
Ybar  reading  program  designed 


» 24  montti 


6-24  months  sftsr 


6-24  months  bsfors  I  I  within  6  months 
*  24  months  sftsr 


Hgtire  2.  Relative  age  of  tables  read  by  date  of  reading  program  definition. 


update)  them  are  roughly  the  same.  (For  example,  in  an  object-oriented  environment, 
the  dates  of  object  and  method  definition  would  tend  to  be  the  same).  Yet,  as  shown 
in  the  figure,  one  fourth  of  the  original  qistem  was  modified  to  update  tables  defined  2 
or  more  years  after  the  program’s  definition.  Furthermore,  the  data  indicate  that  new 
programs  often  write  to  tables  developed  considerably  eariier  (e.g.,  of  the  writes  in 
programs  defined  in  1986, 35%  were  to  tables  defined  2  or  more  years  earlier). 
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Figure  3.  Relative  age  of  tables  written  by  date  of  writing  program  definititMi. 


This  analystt  of  the  relative  ages  of  the  OCIS  programs  and  the  items  with  which  th^  interact 
demonstrates  that  OCIS  is  a  deeply  integrated  system.  There  are  few  changds  that  do  not 
oqiloit  existing  qnitem  features,  and  few  existing  features  are  unaffected  by  the  introduction  of 
new  capabilities.  I  assmt  that  this  degree  of  integration  is  implicit  in  an  information  system 
specification,  but  it  is  the  lack  of  adequate  support  for  devek^ment  and  maintenance  that 
inhibits  its  realization.  After  all,  if  maintenaime  »  difiOcult  and  expensive,  then  it  would  be 
prudent  to  localize  change  and  establish  clear  module  boundaries. 


Coadasion 

This  paper  has  shown  that  there  is  at  least  one  largO’Scale,  complex  information  system 
that  has  operated  effectively  in  a  critical  environment  and  that  exhibits  a  ntm-modular,  deeply 
integrated  itesign.  By  conventional  standards,  the  amorphous  nature  of  the  qatem  architecture 
would  be  considered  a  poor  design;  one  that  would  be  very  difBcult  to  maintain.  Yet  tlm  history 
of  this  project  b  one  of  high  productivity,  easy  maintenance,  and  flexibility  in  responding  to 
changing  needs.  Based  in  the  widespread  difficulty  exhibited  by  other  organizations  in 
maintaining  clinical  qrstems,  I  do  not  believe  that  the  success  of  the  OCIS  project  is  related  to 
the  specific  domain  in  which  it  operates. 

In  mudi  of  my  work  I  have  tried  to  distinguish  between  the  modeling  of  the  software 
q^tem  intended  to  meet  a  need  and  the  modeling  of  the  implementation  that  win  satisfy  that 
need.  Two  very  distinct  classes  of  model  are  involved,  and  if  we  begin  with  the  wrong  model, 
then  we  will  be  forced  to  structure  our  reasoning  within  the  context  of  that  modd.  Modding 
a  problem  domain  should  be  holistic  when  we  are  concerned  with  complex,  integrated  problems; 
modeling  the  solution’s  implementation  will  be  modular.  Our  heritage  of  hardware  development 
has  taught  us  to  use  deramposition  to  reduce  large  problems  into  smaller,  mote  solvable 
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probleno.  This  approach  was  reinforced  by  the  early  models  of  human  information  processing. 
Newer,  connectionist  models  recognise  the  parallelism  that  permeates  the  universe.  Computer 
architecture  is  moving  to  exploit  this  parallelism;  the  emerging  concept  of  concurrent  design  is 
another  of  these  transitions  from  a  serial  to  a  parallel  process. 

The  point  of  this  paper,  therefore,  is  that  perhaps  we  are  looking  at  the  problem  the 
wrong  way.  For  many  years  we  have  imposed  a  hardware-oriented  discipline  on  software 
production,  and  the  results  have  been  satisfactory.  Qearly,  we  have  accomplished  a  great  deal 
Yet,  with  OCIS,  an  apparently  chaotic  organization  also  seems  to  work  very  well  Our  challenge 
is  to  understand  why  it  works,  so  that  we  can  apply  its  lessons  to  the  tactical  systems  that  the 
Navy  needs.  That,  of  course,  is  the  primary  focus  of  my  research. 
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INTEGRATION  COMPONENTS,  SPACES,  AND  CELLS 


Jeffrey  O.  Grady 

Geaeral  Dyaaadcs  Space  Systeau  INvUea 


Setdag  the  State  for 
Intetratioa  Pecompo«hk>n 

The  qwdficatkHi,  ooooept  devdopment,  design,  and 
verification  activities  tax  onnptex  ^ystans  involving 
baidware,  software,  and  huinan  action  requires  intensely 
cooperative  woA  among  maiiy  talented  q)ecialiaed 
engineers  none  of  vriiom  are  cqidble  of  undastanding 
the  total  problem  that  the  system  must  solve  in  complete 
depth  across  the  cmiqiletB  system,  ^wdalization  carries 
with  it  the  need  for  integration  of  the  many  small 
problem  solutions  into  the  total  system  solution. 

The  rdurth  of  the  systems  approach  in  the  form  d  IPD 
or  concurrent  engineering  smiles  in  mai^  cmiqranies 
due  to  failure  to  rqrgrade  communication  techniques  and 
integration  processes  to  the  needs  d  the  IPD 
emdtonmenL  These  two  ftilures  are  inta-related  and 
must  be  unraveled  by  clear  lines  d  authority  and 
reqxmsibility  for  the  teams  focused  on  the  organization 
of  the  product  and  a  dear  and  universal  understanding 
of  what  system  integration,  the  subject  d  much  of  the 
communication  that  must  take  place,  is. 

It  would  be  convenient  if  we  could  describe  all  of  the 
facets  of  tystem  integration  in  terms  of  a  single  entity. 
The  author  is  convinced,  after  years  of  work, 
observation,  and  study  in  this  fidd,  that  many  of  us 
accept  incorrectly  that  iiuegtation  is  a  single  activity. 
Few  of  us  are  d>le  to  describe  how  that  single  activity  is 
performed,  however.  Many  people  assign  the  term 
"system  integration"  a  mystical  quality.  Whatever  it  is,  it 
is  the  answer  to  every  system  engineering  problem  and 
seemingly  just  comiecting  the  term  in  the  Mme  sentence 
with  a  proUem  is  suffident  to  define  an  approach  to  the 
int^ratkm  process  in  our  proposals  and  conversations. 
We  will  see  in  this  piper  that  the  integratitm  process  can 
be  decomposed  into  several  parts  and  these  parts 
eqilained  in  an  uncmiqdicated  way.  System  integration 
thm  becomes  the  sum  of  those  parts. 

We  begin  with  an  aooqxanoe  that  an  engineering 
organization  that  must  deal  with  multiple  system 
development  activities,  should  organbx  in  a  matrix 
stmcture.  The  matrix  has  the  advantage  d  focusing  day- 
to-day  work  on  qredfic  program  problems  vriiile 
providing  a  good  environment  within  v^ch  to  inqnove 
the  organization's  drills,  methods,  t  xrls,  and  knowledge 
through  contimious  process  improvemenL  We  should 
also  accqM  the  good  sense  that  we  cannot  beat  the  odds 


on  specialization.  Our  engineering  ocgmizatioe  iita 
idea  its  petsoonel  fiom  the  same  pool  as  everyone  else, 
humanity.  We  humans  ate  knowledge  limited  and  we 
solve  the  problem  caused  by  that  limitrtinii  dnouj^ 
qwdalization. 

Therefore,  we  will  organize  our  personnd  into 
functional  qtedaltics  (dqiattments)  kd  by  fimctiooal 
department  Oiieft.  These  Oiieft  will  be  reqxmsiUe  for 
prcviding  all  of  our  programs  with  qualified  personnd. 
drilled  in  using  a  particular  todsd  and  fidlowing  the 
standard  dqpartment  procedures  proven  effoctive  on  past 
programs.  We  insist  on  standards,  tlmt  ate  continurmdy 
improved,  because  we  wish  to  take  advantage  of  the 
practice-practice-inactioe  tem|riiae  used  by  great 
athloes. 

On  a  given  program  the  produa  will  be  organized  into 
sub-dements  that  can  be  woriced  on  by  one  or  more 
Int^rated  Produa  Development  (IPD)  Teams  arxl  we 
will  asdgn  our  persmmd  to  these  teams  which  wUl  form 
the  princqral  personnd  supervisory  structure  within  the 
program.  The  woric  of  all  of  the  IPD  Teams  on  a  given 
program  will  be  coordinated  by  a  System  Engineeriog  & 
Integration  (SEI)  Team. 

We  will  organize  all  program  wmk  into  process  stq>s 
linked  to  produa  entities  under  the  teqxmdbility  of  one 
or  more  IPD  Teams.  Eadi  process  stq>  will  have  a  sd  (rf 
goals  and  ample  task  descrqrtion.  We  will  map  our  IPD 
Team  reqxmdbilities  to  the  processes  and  identify 
leaders  for  eadr  process.  All  of  the  processes  will  be  laid 
into  our  integrated  schedule  with  dear  start/stop  dates 
and  budgets  uiiidi  reflea  bade  irrto  the  m)  structure. 
Each  process  will  have  deariy  identified  information 
and/or  material  produa  ouqxits  that  ate  needed  in  otha 
inocesses  as  irqwts.  Also,  each  process  step  will  have 
associated  with  it  a  sinqrle  criteria  by  vriiich  it  is 
possible  to  determine  vriien  the  tadc  is  conqrlete  in  the 
form  of  a  corrqrletion  criteria. 

Integration  Components 

These  assurrqxions  and  selections  leave  la  with  the 
problem  of  cottdrinirtg,  or  integrating,  the  work  of  many 
people  in  different  fimctimial  disdplines,  working  on 
different  produa  tystem  oompments,  in  marty  different 
I»ooess  stq»  over  time.  We  define  these  three 
fundamental  integratkm  ownponents  as  function, 
product,  and  process  respectivefy.  Be  very  careful  that 
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you  understand  that  we  have  used  the  word  function 
here  to  mean  functional  organization  and  not  product 
system  function. 

In  each  of  these  oonqtonents,  we  have  to  otmoem 
ourselves  with  two  fundamental  lands  of  integration:  co* 
oonqmnent  and  cross-conq»nent  integration.  In 
addition,  we  have  to  account  for  the  possibili^  that  (me 
or  two  of  the  components  is  not  inv(dved.  This  means 
we  have  a  three-valued  situation:  oo,  cross,  and  null 
values.  With  three  variables,  eadi  with  three  values,  it  is 
obvious  we  are  woridng  with  27  (3  cubed)  different 
integration  possibilities.  It  is  helpful  to  have  a  picture  (ff 
this  problm  to  better  umterstand  all  of 
possibilities. 

Unfortunately,  we  cant  easily  illustrate  the  three-valued 
relationship  for  each  of  our  three  components,  so  our 
visual  mo(lel  will  disregard  the  null  possibility  for  each 
component  Figure  1  illustrates  the  three  components  as 
a  three  dimensional  ^ent  We  can  imagine  that  we  can 
assign  positions  on  the  function  axis  to  discrete 
organizations  in  our  functional  organization  such  as 
reliability,  structural  design,  quality  assurance,  etc. 
Similarly,  we  can  assign  positions  on  the  product  axis  to 
the  elements  of  the  system  (avionics  system,  on*board 
computer,  circuit  card,  transistor,  etc.)  in  a  hierarchical 
fashion.  Finally,  the  third  axis  can  have  positions 
marked  out  corresponding  to  the  processes  on  our 
program  process  diagram,  such  as  identity  item  X 
requirements,  design  item  X,  and  test  item  X. 


functional 

ocPTcn 


Figure  1  Integration  (Components. 

For  a  given  functional  organization,  all  of  the  work  that 
a  specialty  discipline  does  can  be  thought  of  as  being 
within  the  plane  passing  perpendicular  to  the  fimction 
axis  at  that  function  position.  The  tadc  of  integrating  ail 
of  the  work  on  that  plane  is  called  co-function 
integration,  the  integration  of  work  accomplished  by  two 


more  qrecialists  in  that  one  functional  disci|rfiiif  The 
other  two  variables  have  null  values  in  this  limiting 
case. 

Similar  planes  can  be  constructed  to  the 

other  two  axes  at  difEsrent  points  <m  those  axes  to 
capture  all  of  the  co-process  and  co-product  integration 
work.  The  careful  reader  will  observe  that  every  pc^  in 
the  three-qmce,  then,  correqxmds  to  some  oondnnation 
of  CO  arxl  cross  component  int^rmkm  for  the  three 
components. 

Let  us  now  define  the  three  int^ration  conqxments  in 
terms  of  their  possible  values.  Three  ccwnbi^ons  are 
nulls,  meaning  no  integration  for  that  ccmqxment;  Null- 
Function.  Null-Process,  and  Null-Product  int^ratiotL 
The  other  six  possibilities  are  co  and  cross  combinations 
with  each  of  the  three  components.  Let  us  take  each  of 
these  six  cases  in  turn  assuming,  in  each  case,  that  the 
other  two  components  have  a  null  value  for  the  moment. 

(^Function  Integration  coordinates  the  work  of  two  or 
more  persons  from  the  same  functional  department 
(specialty  discipline)  to  ensure  th^  are  all  using  the 
same  tools,  techniques,  and  procedures  in  an  ^ropriate 
frishion  and  that  their  results  are  consistent  with  other 
work  on  one  or  more  programs  or  tystems.  This  task  is 
the  te^nsibility  of  the  senior  program  functional 
specialist  for  the  project  or  the  functional  supervisor,  in 
the  case  where  all  comparty  projects  are  the  target  of  the 
integration  work. 

Cross-Function  Integration  coordinates  the  work  of 
persons  from  two  or  more  functional  disciplines  in 
search  of  sub  optimal  design  and  qiecialty  engineering 
solutions  needing  re  balancing,  mutual  cor^icts  between 
specialty  requirements  or  the  coneqwnding  design 
solutions,  available  unused  margin  to  be  rqx>ssessed  and 
applied  more  effectively,  and  wayward  interpretations  of 
the  project  or  product  requirements  that  may  lead  to 
conflict  This  is  a  program  responsibility  that  may  frill 
upon  an  PD  Team  Leader,  task  principal,  or  person 
from  the  System  Engineering  &  Integration  Team  as  a 
function  of  the  relationship  of  the  work  to  process  or 
product  and  the  integration  level. 

Co-Process  Integration  coordinates  the  work  of  two  or 
more  persons  working  in  the  same  program  process  to 
ensure  that  th^  all  are  focusing  on  the  same  process 
needs  within  available  budget  and  schedule  constraints. 
This  task  is  accomplished  tty  tte  assigned  task  leader 
who  is  responsible  to  achieve  the  task  goals  on  time  and 
on  budget 

Cross-Process  Integration  coordinates  the  work  of  two  or 
more  task  teams  working  different  processes.  It  seeks  to 
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ensure  that  the  work  dT  the  two  or  more  teams  is 
mutually  consistent  and  driven  by  the  same  program 
goals.  This  integration  task  is  the  responsibility  d*  a 
program  person 

Co-Product  Integration  coordinates  the  work  of  two  or 
more  persons  develq>ing  the  design  solution  for  a 
particular  product  item.  This  m^  be  to  devel<^  an 
integrated  set  of  product  item  requirements,  evolve  an 
optimum  synthesis  of  those  requirements  in  terms  of  a 
design  concept,  ensure  that  all  of  the  cooperating 
qiedalty  views  are  respected  in  the  design  solution,  or 
integrate  the  test,  analysis,  and  design  woric  associated 
with  that  product  item.  Where  an  item  requires  a  special 
test  article,  such  as  a  flow  bench  for  a  fluid  ^em,  this 
woric  should  also  ensure  that  the  test  article  prqperly 
reflects  the  same  requirements  and  design  enib^ed  in 
the  product  item. 

Cross-Product  Integration  is  the  commonly  conceived 
integration  component  that  most  people  would  first 
think  of  in  response  to  the  word  integratioa  It  focus« 
primarily  on  integration  of  interfaces  between  product 


items  to  ensure  that  all  inteiftoe  tnminals  and  media 
are  coaqtatible  both  iriiysically  and  fimctionally.  This 
work  could  be  as  simple  also  as  ensuring  that  all  items 
are  painted  the  correct  oolcu',  have  satisfied  a  particular 
;q)e(^ty  requirement  as  in  beug  maintainable  as 
defined  by  an  ability  to  remove  and  rq)lace  in  10 
miniitgic  It  is  difficult  to  talk  idtout  t^  form  of 
integration  without  linking  mber  integration 
components. 

Integration  Spaces 

Integration  woric  seldom  falls  imo  one  of  these  pure 
cases.  Commonly  we  have  to  deal  with  more 
complicated  situations  involving  combinations  of  two  or 
three  components  with  mixes  of  cn^  co,  and  null 
values.  Figure  2  offers  four  particular  examples  of  these 
possible  combinations,  or  spaces,  for  further  discussion. 
Once  again,  we  disregard  the  null  value  case  to  enable 
simple  graphical  portrayal  in  three  dimensions  on  two 
dimension^  p&ptt. 


ISOtATEO  CROSS-yUMCnOWAl, _ 

CO>PROOUCT.  CO-TASK  aOEORATWH  jJffnUH 
AN  IPD  TEAM  ON  A  SPECIFIC  TASK 


aOUTED  CO-FUNCnONAL. 
CIWSS-PNOOUCTjCO-T^IjnECWAT^  ST 
SI  TEAM  WtTHW  A  SPECnC  TASK 


CROSS-FUNCTIONAL,  CC^C^T. 

CROSS-TASK  INTEGRATION  BY  A 
PRODUCT  IPO  TEAM 


CROSS-FUNCnONA^WS^PROOUCT, 

CROSS-TASK  SITEORaj^ 

FOR  THE  WHOLE  PROGRAM 


Figure  2  Integration  Space  Examples. 


In  Table  1  we  use  a  simple  tertiary  counting  scheme  to  three  variables.  There  can  be  no  other  forms  of 

ensure  we  have  not  omitted  any  combinations  of  the  integration  work  than  those  listed  in  Table  1  given  that 
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we  bave  included  eveiy  q^uofniate  integration 
component  It  would  be  possible  to  add  one  or  more 
components  to  our  list  in  addition  to  function,  process, 
and  product  The  effect  would  be  to  multiply  tbe  number 
or  different  integradon  qtaces.  Tbe  number  of 
integraticm  spaces  (S)  is  predictable  as  follows: 


where: 

C  ~  the  number  oi  integration 
components  and 

n  ~  the  nundter  of  values  for  each 
variable. 


S-C“ 


Table  4-1  Integration  Space  Identification. 


ID 

FUNCTION 

PROCESS 

PRODUCT 

INTEGRATION  TYPE  NAME 

0 

NULL 

NULL 

NULL 

INDIVIDUAL  EFFORT 

1 

NULL 

NULL 

CO 

ISOLATED  CO-PRODUCT 

3 

NULL 

NULL 

CROSS 

ISOLATED  CROSS-PRODUCT 

3 

NULL 

CO 

NULL 

ISOLATED  CO-PROCESS 

4 

NULL 

oo 

CO 

CO-PROCESS  A  PRODUCT 

3 

NULL 

CO 

CROSS 

co-processa:r6ss-product 

6 

NULL 

CROSS 

NULL 

ISOLATED  CROSS-PROCESS 

7 

NULL 

CROSS 

CO 

cross-processaxvproduct 

S 

NULL 

CROSS 

CROSS 

CROSS-PROCESS  A  PRODUCT 

9 

CO 

NULL 

NULL 

ISOLATED  CO-FUNCTION 

to 

CO 

NUU 

CO 

CO-FUNCnON  A  PRODUCT 

It 

CO 

NULL 

CROSS 

CO-FUNCnON/CROSS-PRODUCT 

12 

CO 

CO 

NULL 

CO-FUNCTION  A  PROCESS 

13 

CO 

CO 

CO 

ALL  CO 

14 

CO 

CO 

CROSS 

FOURTEEN 

13 

CO 

CROSS 

NULL 

<»-FUNCT10N/CROSS-PROCESS 

IS 

CO 

CROSS 

CO 

SIXTEEN 

17 

CO 

CROSS 

CROSS 

SEVENTEEN 

18 

CROSS 

NULL 

NULL 

ISOLATED  CROSS-FUNCTION 

19 

CROSS 

NULL 

CO 

CROSS-FUNCnON/CO-PRODUCT 

30 

CROSS 

NULL 

CROSS 

CROSS-FUNCTION  A  PRODUCT 

31 

CROSS 

CO 

NULL 

CROSS-FUNCTION/CO-PROCESS 

22 

CROSS 

CO 

CO 

TWENTY  TWO 

23 

CROSS 

CO 

CROSS 

TWENTY  THREE 

34 

CROSS 

CROSS 

NULL 

CROSS-FUNCTION  A  PROCESS 

23 

CROSS 

CROSS 

CO 

TWENTY'  nVE 

26 

CROSS 

CROSS 

CROSS 

ALL  CROSS 

It  for  example,  we  concluded  that  there  should  be  five 
components  instead  of  the  three  covered  in  this  paper, 
and  three  values  (co,  cross,  and  null)  for  each 
component,  we  would  have  3^=123  integration  spaces. 
As  you  can  imagine  this  could  get  out  of  hand  very 
rapidly  making  the  description  of  integration  more 
complex  than  the  work  itself.  Three  components  and 
three  values  appeared  to  the  author  to  be  a  good 
compromise  befiveen  completeness  and  understand- 
ability. 

Each  imegration  space  involves  a  merger  of  a  unique 
groiqiing  of  three  particular  values  for  Jie  components 
and  represents  one  integration  mode  useful  in  one  or 
more  particular  situations. 

Of  the  27  integration  spaces,  we  can  dispense  with 
INDIVIDUAL  EFFORT  integration  immediately.  We 
must  assume  that  a  single  individual  is  fully  capable  in 
their  field  of  specialized  knowledge  and  able  to 


effectively  aj^ly  this  knowledge  to  a  specific  product 
item  and  in  a  particular  single  process.  This  is  why  we 
specialized,  after  all,  to  create  a  human  task  that  is 
within  the  power  of  a  normal,  single,  qtecialized 
individual  to  master.  We  assume  that  each  specialist  can 
cany  on  an  internalized  conversation  with  themselves 
and  use  the  power  of  their  specialized  discipline  to  solve 
the  small  problems  we  have  tried  to  firame  in  our 
decomposition  efforts. 

Similarly,  the  ALL  CO  integration  space  is  usually  not 
very  interesting  to  a  system  engineer  since  it  involves 
people  from  the  same  functional  dqtartment  performing 
work  in  the  same  process  stq>,  for  the  same  system 
element. 

One  other  fairly  simple  integration  space  to  explain, 
though  the  most  complex  of  them  all  in  practice,  is  ALL 
CROSS  integration  involving  cross  everything.  When  a 
member  of  the  SEI  team  integrates  the  work  of  several 
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members  of  two  PD  teams  developing  the  design  of  two 
different  product  elements  and  die  work  of  a  facilities 
engineer  responsible  for  the  factory  tjiat  will  assemble 
these  two  elements  and  a  tooling  desigiKr  responsible 
for  the  manufacturing  equipment  that  will  hold  the 
elements  during  mating,  we  have  an  example  of  this 
kind  of  integratioiL  The  reader  will  be  able  to  imagine 
several  other  cases  of  this  integration  space. 

We  have  given  exanqrles  now  for  nine  of  the  27 
integration  spaces:  including  the  six  isolated  integration 
cases,  INDIVIDUAL  EFFORT  (or  ALL  NULL).  ALL 
CO,  and  ALL  CROSS.  This  leaves  18  remaining  to  be 
explained  with  at  least  one  example.  Some  of  these  18. 
you  can  see  from  Table  1,  are  very  difficult  to  name  so 
we  will  simply  use  the  ID  number  for  a  name. 

Integration  Cells 

In  almost  any  given  system  development  work  situation, 
we  will  find  it  necessary  for  some  combination  of  the  27 
integration  spaces  to  be  applied.  Very  litUe  system 
development  work  can  be  accomplished  in  total 
autonomy  today.  This  phenomenon  appears  because  we 
have  had  to  specialize  very  finely  to  master  enough  of 
the  available  knowledge  base  to  be  competitive.  The 
number  of  these  combinations  is  finite  but  quite  large  for 
a  large  development  program. 

We  have  many  options  in  grouping  all  program  work 
into  unique  combinations  of  integration  spaces  but  the 
best  way  to  do  this  is  probably  driven  the  program 
tasks  that  would  appear  on  a  program  task  network. 
Each  program  task  has  associated  with  it  some  set  of 
fimctional  disciplines  performing  work  on  some 
particular  combination  of  product  elements.  Many  of 
these  tasks,  possibly  most,  will  require  some  form  of 
integration  in  the  context  of  some  combination  of  the  27 
integration  spaces  defined  above.  Let  us  call  any  one  of 
these  combinations  of  task,  product,  and  functional 
organization  an  integration  cell. 

The  more  finely  we  divide  the  overall  program  into 
tasks,  the  more  uruque  integration  cells  we  will  have. 
The  more  finely  we  assign  IPD  teams  to  develop  the 
system,  tiie  larger  the  number  of  integration  cells.  The 
more  functional  disciplines  that  must  be  assigned  to  the 
program,  the  more  integration  cells  that  will  be 
necessary.  At  the  same  time,  the  larger  the  number  in 
zay  of  these  integration  components,  the  more  simply 
we  can  describe  each  integration  cell  because  they  will 


consist  of  a  less  complex  combination  of  integration 
spaces. 

Program  World  Line 

So.  the  integration  process  is  more  complex  than  we 
might  have  first  imagined,  composed  of  a  finer  structure 
than  we  might  have  thought  But  the  complexity  does 
not  stop  here.  We  must  not  onl^  apply  e^  of  these 
integration  spaces  well  within  the  context  of  the 
integration  cdls  defined  for  the  program,  but  we  must 
apply  them  in  a  pattern  coordinated  with  the  ptt^ram 
schedule.  Figure  3  illustrates  this  need  by  pladng  the 
integration  graces  coordinate  q^stem  on  a  program 
world  line.  Tluoughout  program  passage  on  its  world 
line  (network  or  schedule),  apprqrriate  specialists  must 
apply  the  appropriate  integration  spaces  to  the  planned 
integration  cells  in  accordance  with  in  integral^  plan. 
To  the  extent  that  we  can  reduce  aggregate  program 
work  early  identification  and  resolution  of 
inconsistencies  between  product  and  process  elements, 
we  encourage  success  in  system  integration.  Ideally,  we 
should  be  able  to  do  all  work  error-free  on  the  first  pass. 


nmcnoNAi. 

PCPTCT) 


Figure  3  System  Development  World  Line. 

Now  we  can  answer  the  question,  "What  is  ^em 
integration?"  It  is  the  rich  nuxture  of  three  integration 
components  applied  in  combinations,  defined  by  the 
resultant  integration  spaces,  to  the  work  confined  to  a 
finite  number  of  integration  cells  across  the  program 
world  line  that  actually  comprises  ^stem  integration  on 
any  given  program. 


42 


Complex  Systems  Engineering  Synthesis  and  Assessment  Technology  Workup,  July  20-22, 1993 


An  Efficient  Approach  to  Systems  Evolution  fEASEJ 

Thomas  C.  Choinski,  (203)  440-5391 
John  G.  DePrimo,  (203)  440-5723 
Naval  Undersea  Warfare  Center  Detachment,  New  London,  CT 

Today,  we  are  in  the  final  stages  ef  a  true  "industrial  revolution"  in  computer  hardware  and  software  which  has 
totally  transformed  the  imhtstry.  Navy  has  no  choice  bid  to  adopt  these  changes.  By  leveraging  the  industrial 
revobition.  Navy  can  reap  great  beni^  in  increased  capdiility  and  decreased  cost.  If  the  Navy  continues  to 
develop  unique  product  lines,  it  will  likely  face  ^uraUng  costs  udiile  falling  further  behind  in  capability."^ 

NRAC  91-1  Office  of  ASN  RD&A 


INTRODUCTION 

Political  changes  throughout  the  worid,  flscal 
constraints  set  by  the  United  State  Congress  and  a 
shift  in  corporate  produa  planning  have  created  tte 
need  for  a  new  acquisition  environment  within  the 
Department  of  Defense  (DoD).  The  Director  of 
Defense  Research  and  Engineering  (DDR&E)  has  set 
guidelines  to  direct  these  changes.^  The  guidelines 
consist  of  7  Thrust  Areas.  The  concept  for 
transitioning  technology  described  herein  specifically 
addresses  Thrust  Area  4  (Undersea  Superiority)  and 
Thrust  Area  7  (Affordability). 

DoD  must 
facilitate  the  process 
used  to  transition 
technology  into 
warfare  systems  to 
achieve  affordability 
objectives.  No 
longer  can  DoD 
build  warfare 
systems  focused  on 
delivering  increased 
performance  to 
oppose  specific,  pre¬ 
defined  threats. 

Today's  warfare 
systems  must  be 
able  to  adapt  to 
unpredictable  world 
situations  similar  to  die  Gulf  War. 

The  Efficient  Approach  to  Systems  Evolution 
(EASE)  concept  engendered  a  visirni  for  develqiing 
warfare  systems  in  dw  future.  The  concept  seeks  to 
make  this  vision  a  reality  by  capitalizing  on  key 
enalding  technologies.  EASE  has  set  out  to: 

Foster  the  application  of  fast  paced, 
emerging  commercial  technology  to 
the  development  and  acquisition  of 
affordable  warfare  systems. 
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Figure  I.  Product  Versus  Platform  Ltfe  Cycles 


EASE  win  iniroduce  new  products  and  processes 
to  transition  technology  in  an  ^ordaUe  manner.  The 
development  of  computer  aided  system  engineering 
design  tools  and  the  strong  emergence  of  hi^ 
performance  computirtg  and  communications  (HPCC) 
technology,  will  make  the  transition  feasible.  This 
paper  discusses  these  components  further  in  the 
Vision  and  Technology  ‘Damdtion  sections. 

The  HPCC  Program,  fimded  by  the  Advanced 
Research  Projects  Agency  (ARPA)  through  the 

Federal  HPC  Program,^  offers  the  potential  for 
TERAFLOP  (1  trillion  floating  point  operations  per 

second)  computing 
capability.  EASE 
will  use  HPCC 
technology  as  an 
enabling  force  to 
showcase  potential, 
future  alternatives 
to  the  military 
system  acquisition 
process. 

This  paper 
outlines  the  EASE 
concept  Thepaper 
discusses:  the  tl^ 
primary  issues 
addressed  by  EASE, 
the  new  vision  for 
transitimiing 

technology,  and  the  specific  warfare  system 
qiplications  selected  to  showcase  EASE 

ISSUES 

The  EASE  concept  focuses  on  three  issues 
systems  engineers  confront  when  transitioning 
technology  into  warfare  systems:  commercial  product 
life  cycles,  DoD  policy,  and  the  tradeoff  between 
geimal  commercial  based  processing  and  aiqilication 
specific  (point  technology)  resources. 
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The  first  issue  concerns  dw  changing  rate  of  new 
]>roduct  development  in  the  commercial  sector. 
Twenty  years  ago  the  commercial  sector  distributed 
computer  products  with  life  cycles  greater  than  10 
years.  Digital  Equipment  Corporation's  (DEC)  VAX 
computer  exemplifies  this  phmiomena.  DoD  could 
transition  products  with  10  year  life  cycles  easily, 
since  the  10  year  cycles  minimized  the  number  of 
technology  upgrades  for  Navy  platforms  used  for  30 
years  (Egure  1). 
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increased  utilization  of  general  purpose  commocial 
based  processors.  Figure  3  illustrates  this  trend. 

Warfare  systems  designed  during  the  1960's 
primarily  used  point  technologies  (i.e.,  special 
purpose  designs)  to  achieve  increased  performance. 
During  the  1980's,  enpneers  designed  w^are  systems 
that  incorporated  commercial  based  processors,  like 
Motorola's  68030;  although,  designers  placed  these 
processors  on  militarized  circuit  cards. 
Conseouendv.  resulting  architectures,  consisted  of  an 
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However,  today's 
commercial  computer 
companies  have 
accelerated  their  time  to 

market.^  As  a  result, 
product  life  cycles  have 
decreased  to  less  than  5 
years.  In  some  cases, 
life  cycles  last  less  than 
1  year.  DoD  has 
difficulty  transitioning 
these  rapidly  emerging 
comment  products  into 
warfare  systems. 

Shortened  commercial 
produa  life  cycles  require 
frequent  technology 
upgrades.  The  current  acquisition  process  can  not 
k^  up  with  the  rate  of  change  of  technology. 

The  second  issue  deals  with  DoD  Policy.  For 
example,  the  Navy  has  changed  the  policy  concerning 
the  utili^tion  of  commercial  off-the-shelf  (COTS) 
products.  In  the  past,  the  Navy  demanded 
militarization  of  equipment  for  shipboard  use.  The 
Navy  granted  waivers  for  using  commercial  equipment 
in  military  systems.  DoD's  af^orementioned  focus  on 
affordability  has  reversed  this  policy.  Accordingly, 
MIL-STD  2036  stipulates  the  requirement  to  use  non- 
developmental  items 
(NDl)  and  COTS. 

Program  managers 
must  justify  using 
rugged  and  militarized 
equipment  with  an 
operational,  service  or 
economic  consideiatitxi. 

This  policy  essentially 
compels  a  cultural 
change  within  the 
Navy.  Figure  2 
illustrates  the  process 
involved  in  MIL-STD 
2036.5 

The  third  issue 
addresses  a  trend  in  warfare  system  architectures. 
Analysis  of  warfare  systems  that  have  evolved  over 
the  last  thirty  years  uncovers  the  trend  toward  the 
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Figure  2.  MIL-STD  2036  PoUcy 
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Figures.  Commercial  Based  Processing  Trend 


amalgam  of  commercial 
based  technologies  and 
special  purpose  designs. 
(5nce  again  engineers 
concentrated  on 
performance. 

The  next  step 
continues  the  trend 
towards  gen^  purpose 
commercial  based 
processing  technology. 
One  alternative  uses  an 
emerging  technology, 
like  the  one  promot^ 
by  ARPA's  High 
Performance  Computing 
and  Communications 
Program.  In  fact,  the 
vision  propagated  by  EASE  uses  HPCC  technology 
as  an  enabling  force. 

VISION 

The  first  step  in  achieving  the  EASE  vision 
enteils  the  creation  of  a  "homogeneous"  warfare 
system.  HPCC  technology  offers  one  way  to  obtain 
this  goal. 

The  homogeneous 
warfare  system  sets  the 
stage  for  affordability. 
Figure  4  illustrates 
how  each  function  in  a 
warfare  system  (e.g., 
acoustics, 
communications, 
photonics,  etc.)  can  use 
a  generic  HPCC  based 
processor;  hence,  the 
homogeneous  nature  of 
theardiitecture. 

Note,  however,  that 
heterogeneous 
processing  resources  do 
exist  within  a  given 
function.  For  example,  the  acoustics  function 
contains  a  receiver,  HPCC  computer  and  workstation. 
Similarly,  HPCC  technology  also  contains 
heterogeneous  processing  resources. 
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The  homogeneous  warfare  system  achieves 
affordability  by  using  one  software  development 
environment  for  each  fonction,  and  by  minimizing  the 
number  of  card  types  across  die  syst^  Thewsorfare 
system  also  can  reconfigure  computing  resources  for 
low  duty  cycle  functions.  The  applications  section 
discusses  reconfiguration. 

General  purpose 
commercial  based 
processing  fosters 
the  transition  of 
future  technology. 

The  warfare  system 
can  use  either 
commercial  or 
rugged  equipment, 
but.  land  based 
development 
facilities 
incorporate 
commercial 
equ^eiu. 


NFC 


each  able  to  leverage  the  inocess  and  produa 
innovations  of  the  original  product^ 

The  homogeneous  warfare  system  provides  the 
foundation  to  transition  technology  in  die  Mine  nawit 
as  the  personal  computer  analogy.  However, 
transitioning  technology  demands  more  than  an 

ardiiiBctHtal 
foundation.  Fiscal 
constraints  mandme 
DoD's  need  to 
revitalize  the 
transition  process. 
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Figure  4.  Homogeneous  Warfare  System 
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Scientists  and  engineers  use  the  land-based  facility  to 
develop  new  algorithms  and  new  functions  while 
performing  R&D.  Under  the  scenario  where  only 
minm  differences  exist  between  the  architecture  in  the 
laboratory  and  the  system  at  sea.  systems  engineers 
can  efficiendy  port  the  software  to  the  warfare  system. 
Utilization  of  a  HPCC  software  development 
environment  enables  software  portability  into  the 
warfare  ^tem  described  in  figure  4.  figure  S  dqiicts 
this  efficient  process 

On  the  other  ™ 
hand,  systems 
engineers  can  also 
transition  new 
hardware  and 
software  advances 
from  the 
commercial  sector 
as  shown  in  the 
lower  right  hand 
comer  of  figure  S. 

Succeeding 
generations  of 
commercial 
product  families 
can  transition  into 
the  warfare 
system. 

The  IBM  PC 

serves  as  an  analogy.  IBM  tied  their  personal 
computer  to  Intel's  microprocessor  family.  The 
804^  replaced  the  80386SX;  the  80386  succeeded  the 
8(^86.  Over  the  life  of  the  486  product,  Intel  will 
introduce  a  host  of  derivative  products,  each  offering 
some  variaticm  in  speed,  cost,  and  performance  and 
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Figure  5.  The  Vision 


TECHNOLOGY 
TRANSITION 

The  D  0  P 

Research _ and 

Development 
Management  Guide 
discusses  the  push 

and  pull  of  technology.^  The  technology  push 
involves  advances  in  the  state-of-the-art  which  occur 
in  industry  independent  of  emerging  operational 
requirements.  The  technology  pull  encompasses 
advances  in  operational  requiremen'*’  which  surface 
apart  from  the  state-of-the-art  in  ^  <;hnoIogy.  The 
EASE  concept  introduces  a  third  category  into  this 
paradigm,  the  pace  of  technology. 

The  pace  addresses  the  rate  at  which  the  DoD  can 

transition  new 
ideas  from  the 
push  and  pull 
categories  into 
the  fleet.  DoD 
needs  to  create  an 
environment 
which  simplifies 
the  transition  of 
emerging 
technologies. 
Figure  6  proposes 
a  Technology 
Insertion 
Environment 
(TIE)  to 
accomplish  this 
Objective. 
Commercial 

trends  suggest  a  3  year  transition  cycle;  in  other 
words,  technology  should  transitimi  within  3  years  of 
introduction  from  either  the  push  or  the  pull 
categories.  The  3  year  transition  period  requires 
moving  a  warfare  system  product  from  the  fuzzy  front 
end  of  development  to  a  well  defined  implementation 
decision.  The  fuzzy  front  end  describes  die  beginning 
phase  of  product  development  During  this  phase  a 
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product  requirements  may  be  ambiguous.  Moving  to 
a  wen  defined  implementation  decision  will  require 
the  comirtetion  of  the  fundamental  tradeoff  analysis 
between  development  cost,  time,  life  cycle  cost  and 

performance.^  The  Technology  Insertion 
Environmeiit  ejqtedites  the  tradeoff  analysis. 

The  Technology  Insertion  Environment  can 
transition  research  and  development  (RAD)  that 
rq;>lacesddergeneratiCTttofteclmdoflf;_yte^e, 
the  TIE  can  focus 
on  technology 
fusion. 

Technology  fusion 
refers  to  the 
process  of 
combiniiig 
technologies  in  a 

hybrid  fashion.^ 

This  approach 
Implies  pi^culariy 
to  combinations 
from  the  push 
and  putt 
categmies.  as  well 
as  hardware  and 
software. 

The  picture  in 
Hgure  6  portrays 
the  Technology  Insertion  Environment  as  pieces  of  a 
puzzle.  These  pieces  provide  the  infiastractuie  to 
transition  technology.  The  peces  include  but  are  not 
limited  to:  logistics,  concurrent  engineering,  rapid 
prototyping,  requirements  traceability  and  simulation. 
The  EASE  concept  has  identified  numerous  pieces  to 
the  puzzle,  but  has  yet  to  determine  the  apwopriate  fit 
for  the  pieces.  DoD  needs  to  research  this  area  to 
mature  the 
concept.  The 
Engineering  of 
Complex  Systems 
Block  developed 
by  the  Naval 
Surface  Warfare 
Center,  and 
sponsor^  by  the 
Office  of  Naval 
Research,  has 
developed  tools 
suitable  for  the 
TechiKriQgy 
Insertion 
Environment 
Successful 
technology 
transition  depends 
<m  the  Technology  Insertion  Environment 

The  EASE  concqrt  provides  a  novel  process  for 
transitioning  technology.  Many  products  could 
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Figure  7.  AppUcaAon  Selection  Criteria 


showcase  the  EASE  coocqa.  The  following  sectioa 
introduces  two  qrecific  warfare  system  products 
which  focus  on  affonhdMlity. 

WARFARE  SYSTEM  APPLICATION 

An  assortment  of  warfve  system  qrplications 
could  showcase  the  EASE  concqK.  Figure  7  outlines 

the  metrics  used 
to  guide  the 
selection  process. 
The  EASE 
concept  could 
have  included  (and 
ultimately  may 
include)  surface 
ship  or  air 
aptlications. 
Indeed,  EASE  has 
nearly  universal 
military 
application. 
Excqitions 
include  functions 
like  cryptography 
which  remain 
special  purpose. 
Examples  of 
products  considered  include  expanded  TB-29  towed 
array  processing.  TB-29  focu^  beamfoiming  and 
automated  contact  followers  offer  dramatic  manning 
reductions  relative  to  current  processing  techniques. 
This  application  would  necessitate  substantial 
processing  throughput  by  concurrently  processing 
thousands  of  sonar  beams.  ESM  (Electronic  Warfare 

Support 
Measures) 
presented  another 
consideration, 
since  its 
processing 
requirements  are 
very  similar  to 
acoustics  (i.e., 
A/D  conversion, 
signal  processing, 
and  data 

processing). 
However,  ESM's 
signal  processing 
(e.g.,  FFT's, 
correlation,  and 
feature  extraction) 

_  _  require  on  the 

ordm^  of  100  MIPS.  Some  acoustic  sensors  require 
higher  inocessing  rates  than  ESM. 

The  processing  required  for  the  Wide  Aperture 
Array  (WAA)  and  photonics  (i.e.,  periscope 
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Table  I.  WAA  Processing  Requirements 

processing)  have  (q>iiinum  attributes  to  showcase 
EASE.  These  dispaiate,  multi-GFLOP  applications 
out  demonstrate  virtually  any  submarine  w  surfiroe 
application  through  extr^l^on.  For  exanq)le,  one 
of  three  monochrome  sensors  in  the  ARPA  devetoped 
non-penetrating  periscope  (NPP)  forms  an  image 
1024  X  1024  pixeb  by  10  bits/^ixel,  or  about  1 J 
MbytesAraage.  and  captures  ^  images/second. 
Combined  with  advanced  image  processing  algorithms 
invdving  enhancement  and/or  rq>lica  condation,  this 
massive  quantity  of  data  yields  a  10  GFLOP 
minimum  real-droe  requirement 

The  WAA 
sensor  consists  of 
six  panels  of 
elements  attached 
to  the  side  of  a 
submarine.  The 
number  of 
elements  presents 
heavy 

computational 
load.  Other 
reasons  led  to  the 
selection  of  WAA 
besides  the  heavy 
computational  load  for  beamfmming  and  detection 
processing  shown  in  Table  I.  For  example.  WAA  has 
greater  potential  as  a  shallow  water  sensor  than  towed 
anrays.  Towed  arrays  constrain  the  maneuvering  and 
speed  of  a  submarine  to  avoid  dragging  the  array  on 
the  bottom.  In  addition,  the  frequency  response  of  the 
WAA  sensor  is  well  "tuned"  to  emerging  post-cold 
war  contacts  of  interest  The  implementation,  sized 
in  Table  I,  expands  fuiKtioml^j^  (1)  full  detection 
coverage  excluding 
baffles  in  azimuUi 
and  D/E;  (2)  a 
floating  point  vice 
one-bit  clipped 
CDIMUS") 
beamformer  to 
enable  enhanced 
medium  frequency 
active  receive  and 

improved  detection  in  cluttered  environments;  and  (3) 
interpolated  detection  beamforming  to  enable  the 
addition  of  automated  contact  followers  for  reduced 
manning  requirements.  When  referring  these  attributes 
back  to  the  figure  7  an>lication  selection  criteria,  the 
reader  will  find  close  agreemem. 

NUWC  and  Intel  engineers  constructed  the  data  in 
Table!  NUWC  described  the  signal  processing,  and 
computed  the  sustained  throughput  requirements  based 
upon  sample  rates,  the  numbw  of  beams,  and  the 
number  of  sensors.  The  sustained  throughput 
requirement  approaches  12  GFLOPS,  most  of  which 
the  beamformer  consumes.  Colleagues  at  Intel 


benchmarked  the  efficiencies.  Intel  executed  "static" 
benchmarks  to  compute  the  efficiencies  (i.e.. 
computational  nodes  were  not  interconnected). 
Therefore,  table  I  efikiencies  do  not  account  for 
operating  system.  I/O,  and  TADSTAND  demands. 
NUWC's  experience  with  massively  parallel 
processing  tlvough  the  Advanced  Sonar  Signal 
Processor  Architecture  Project  (ASPA)  indicate 
systems  engineers  can  expect  to  achieve  10% 
efficiencies  in  situ. 

In  any  event,  Intel  did  use  FCHITRAN  instead  of 
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They  used  a  high 
level  language  to 
point  out  the 
high  level 
ptt^ramming 
capability  which 
lends  itself  to 
portability  of 
qiplication 
software.  Figure 
8  depicts  an 
examine 
involving 
"porting" 

qrplications  from  a  commercial  Intel  Pardon  to  the 
Honeywell  ruggedized  unit  with  minimal  rewrite  of 
the  software.  The  EASE  concept  includes  the 
portability  of  software  as  a  major  feature. 

The  current  Intel  XP/S-3S  has  a  peak  throughpu. 
which  matches  the  requirements  outlined  in  Table  I. 
Honeywell  expects  to  increase  processing  density  to 
10  GFLOPS/cu.  ft.  in  FY94.  and  to  100 
GFLOPS/cu.  ft.  by  FY98.  through  packaging  and 

water  cooling. 
With  these 

processing 
densities:  (1)  the 
total  size  of  a 
warfare  system 
could  shrink 

considerably;  and 
(2)  low  processing 
efficiencies  become 

less  significant  (e.g.,  10%  as  staled  above). 

The  priorities  of  the  Navy  have  changed,  leading  to 
a  Ivoad  assessment  of  the  future  direction  of  United 
States'  Maritime  forces.^^  With  the  end  of  the  Cold 
War,  U.S.  military  strategy  has  shifted  from 
deterrence  of  global  convention  and  nuclear  war  to 
the  protection  of  vital  national  interests  in  regional 
crises,  contingencies,  and  conflicts.  While  all 
elements  of  the  Armed  Fwces  have  important  roles  to 
play.  America's  nuclear-powered  submarines  are  a 
derive  component  of  the  new  military  strategy.  The 
submarine  will  help  provide  the  flexible  forward 
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presence  and  crisis 
fesponae 

capabilities  that 
have  been  the 
comeistoae  of  our 

national  defense.^  ^ 

The  requirement 
for  total  warfare 
system  flexibility 
provides  the  key  to 
accommodatiQg 
this  evolution. 

Warfare  system 
flexibility  is  the 
ability  to  provide  mission  enabling  functional 
emphasis  in  those  areas  demanded  by  speciflc 
deployment  assignments.  Core 
lequiremenu  for  ship's  safety  and  self-protea  require 
the  applications  in  the  left  pie  chart  of  figure  9. 
Shifting  to  an  Indication  and  Warning  (I&W)  mission 
invokes  specific  requirements  which  could  differ 
significant  for  example,  from  a  traditional  Cold  War 
anti-sutxnarine  warfare  (ASW)  mission. 

The  qiproach  warfare  system  implementation 
provides  the  ultimate  flexibility:  the  ability 
customize,  through  reconfiguration,  the  total  available 
processing  resources  as  mandated  by  its  missioa 

SUMMARY 

The  fbval  Ihidersea  War£ve  Center  has  undertaken 
the  EASE  initiative  to  foster  the  application  of  fast 
paced,  emerging  commercial  technology  to  the 
development  and  acquisition  of  affordable  warfare 
systems.  The  initiative  consists  of  new  products  and 
processes  to  foster  the  transition  of  new  technologies 
into  the  fleet 

New  processes  center  on  the  creation  of  a 
Technology  Insertion  Environment  which  will 
provide  the  tools  to  make  technology  transitions 
practicable.  The  environment  will  address  concerns 
including;  logistics,  concurrent  engineering,  rapid 
prototyping,  requirements  traceability,  and 
simulation. 

To  showcase  the  process,  the  EASE  concept 
includes  the  develtqmient  of  two  products  using  Hi^ 
Performance  Computing  and  Communications 
technolc^.  The  Wide  Aperture  Array  and  the  Non- 
Penetrating  Periscqie  provide  the  applications  for 
HPCC  technology.  Th^  applications  individually 
require  more  than  10  GFLOK  sustained  processing 
ihi^^put  and  together  demonstrate  the  suitability  of 
HPCC  technology  for  a  broad  range  of  DoD  real-time 
processing  functions  like  image  processing, 
beamforming,  database  management  and  baditional 
signal  processing. 
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AN  OVERVIEW  OF  THE  PRCXXSSING  GRAPH 
SUPPORT 
ENVIRONMENT 

RogerHObim 

Naval  Rtxarch  Laboratory 

Wasbiagum,  DC 20375-5000 

Ahbtnat 

TlieNayy  hat  developed  a  data  flow  niedMd  for  pfQgnumiiiDg 
netwoAsofprooesson.  TUs  qipfoacli.  called  die  Prooes^ 
Gcqdi  Mediod  ffGM),  is  now  being  used  to  develop  signal 
processing  applications  for  tbe  Navy's  second*generatioo 
tactical  signal  processor.  At  the  Naval  Reseaidi  Laboratory, 
a  unified  set  of  software  ttxds  has  been  developed  to  facilitate 
PGM  progiamining.  A  Macintosh-based  Gnphic  Entry 
Workstation  (GEWS)  can  be  nsed  to  iconically  cqitnre 
processing  geqihs  which  are  then  antomatically  translated 
into  Signal  Processing  Graph  Notation  (SPGN).  The 
Processing  Grqph  Support  Environment  (PGSE)  is  a  set  of 
Ada  srrftware  utilities  for  compiling,  linking,  and  ct  carting 
the  processing  graphs;  it  indudes  a  large,  user-extensible 
library  of  signal  jKocessing  primidves.  POSE  is  now 
available  for  VAX  systems  running  VMS  and  for  Snn-4 
workstations  under  SUN  OS  4.1.3.  A  dmple  signal 
processing  qjplicadon  is  developed  to  demonstrate  tbe  utility 
of  POSE,  and  current  enhancements  to  the  system  are 
discussed.  GEWS  and  PGSE  are  both  available  fiom  tbe 
Naval  Researdi  Laboratory. 

1.  Introduction 

Many  commercial  products  now  support  iconic  data-flow 
methods  for  programming  fignal  processing  tpplicadoos  [1]. 
Over  the  past  decade,  the  Navy  has  also  develops  a  data-flow 
mediod  for  programming  muldprocesscas.  This  qproadi  is 
called  the  Processing  Graph  Method  or  PGM  [2, 3, 4. 3. 6]. 
and  is  designed  to  reAioe  die  cost  and  comjdexity  cS  signal 
fvocessing  ^plication  development  In  PGM.  signal 
procesfing  qpUcadoiis  are  devetoped  as  directed  data-flow 
gyagbs  (fx  processing  graphs)  aai  command  programs.  Ihe 
numerical  conputadons  are  realized  by  elementary  signal 
processing  functions  underlying  tbe  processing  grph  nodes, 
while  tbe  command  programs  provide  a  medianism  for 
dynamically  managing  graph  execution,  including  gcqih 
inhiatitm,  terminatkm,  and  reconfignradon.  Graphs  are 
determhusde  while  command  programs  mediate  inberendy 


non-detenninisde  reponses  to  rpermor  hput  iadoding  the 

nr «*Mnr 

The  PGM  pecificadon  [2]  dtfnes  the  grammar  and  syntax 
for  Signal  Processing  Graph  Notation  (SPGN),  an 
tnteiBiediate  Ingnage  for  pecUyiag  pcocesdng  graphs  in  a 
text  fbrmat  PGM  also  defines  tbe  neoesswy  fimcdooality 
for  coBaanand  program  procedures  which  are  implemented  as 
extensions  to  a  hoat-pedfic  high-order  language.  Other 
paralld  languages  can  also  be  afplied  to  signal  processing 
[e^.  7, 8, 9],  but  PGM  retains  a  nundier  of  imiqne  advantages: 

(1)  PGM  includes  a  mature  and  stable  faitermediate 
language  for  data-flow  graphs. 

(2)  PGM  functionally  pedfies  control  procedures 
which  can  be  embedded  in  a  high-order  language 
ppropriate  for  the  host  processor. 

(3)  Paralkliam  is  implidt  in  the  PC^  descr^on  of  a 
data-flow  grph.  Hie  progranuna  does  not  have  to 
expliddy  indicate  parallelism  through  the  use  of 
senupbore  constructs,  etc. 

(4)  PGM  is  ardiitecture  indpendeoL 

(5)  PGM  is  the  only  data  flow  method  to  be  fiilly 
implemented  on  a  tactical  military  signal  processor, 
theAN/UYS-2A[10]. 

Tbe  Processing  Grph  Support  Environment  (PGSE)  is  a 
cooplete  implementadon  of  P(^  [11].  Tbe  run-time  shed 
indudes  a  large  user-extensible  library  of  over  125  signal 
processing  primidves.  PGSE  is  designed  is  to  fadlitate  die 
mnld-nser  devdopment  and  testing  of  processiiig  grpbs.  It 
requires  as  iiput  a  set  ttf  SPC34  files,  data  files,  and  eidier  a 
command  program  or  a  sequence  of  interaedve  contrd 
instruedoos.  With  tbe  excqidoo  of  an  cptional  shed  for 
executing  AN/UYS-2A  command  programs,  PGSE  does  not 
incorporate  any  maddne-pedfic  arddtectnre  modds. 

2.  The  Processing  Graph  Method 

A  complete  descr^idon  of  tbe  Processing  Grph  Method  is 
beyond  tbe  scope  this  atdde.  Addidonal  informadon  can 
be  found  in  the  PC^  Specification  [2]  and  Tbtorial  PL  The 
Pictorial  Processing  Gr^  Standsd  [5]  describes  the  iconic 
convendoos  for  PGM  graphs. 

2.1  Processing  Graph  Entities.  A  processing  graph  is  a 
directed  grph  in  which  tbe  vertices  correspond  to  eidier 
nodes  or  subgraphs,  and  the  directed  edges  correpond  to 
queues.  Subgraphs  are  themselves  processing  grphs. 
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Queues,  i^ilcb  convey  data  between  tbe  nodes,  are 
strangly-Q^ped  fint-in  fim-out  01FO)  data  stmctuies.  Tbe 
bead  of  tbe  queue  Is  designated  by  an  arrow,  which  also 
specifies  the  dtaecdoo  of  data  flow  through  the  queue.  Dtta 
ate  idaoed  on  the  tidl  die  queue,  and  removed  fiom  its  head. 
A  queue  can  be  coonected  to  only  one  node  at  its  head  and  onty 
one  node  at  its  taU.  Adynamic  queue  is  a  queue  created  and 
controlled  by  a  command  program.  A  grqph  variable  (GV) 
contains  a  single  data  element,  although  this  element  may 
itself  be  a  multidimensional  array.  A  GV  can  be 
simnUaneousty  attached  to  one  or  more  nodes,  and  is  used  to 
iHoadcast  data  throughout  the  grqih.  Its  value  can  be 
modifled  at  runthne  by  the  output  fiom  a  node  or  a  ooomoand 
program. 

Nodes  are  the  scheduled  entities  within  a  processing  gcqih. 
Each  node  imist  have  at  least  one  faqatt  port,  although  it  is  not 
requited  to  have  an  ooqmt  port  Queues  and  grqih  variables 
are  attached  to  the  node's  ports,  and  must  be  of  die  same  data 
mode  (type)  as  the  port  Each  node  has  an  underlying 
elementary  computational  i»ocess  called  a  primitive. 
Exanqdes  of  signal  processing  primitives  are  Fast  Fburier 
Transftums  (FPTs).  filter  and  bandshift  functions,  and 
elementary  vector  and  matrix  operators.  Data  will  be  passed 
to  and  fnom  the  primitive  during  node  execution. 

Associated  with  each  node  port  is  set  of  Node  Execution 
Parameters  (NEPs).  These  parameters  determine  how  the 
data  on  the  queues  are  to  be  used.  There  are  five  NEPs  for 
input  queues:  threshold  amount,  read  amount,  consume 
amount,  of^  amount,  and  valves.  When  the  number  of  data 
elements  placed  on  each  of  a  node's  input  queues  is  greater 
than  or  equal  to  the  rbrrxAolef  onioanr  for  eadi  queue,  tbe  mxle 
is  rea^y  to  execute.  The /eod  ainaiinr  qiedfies  tbe  number  of 
elements  that  are  read  by  the  node  before  the  underlying 
primitive  executes;  oj^leraniauiiristbenamberttfelenients 
which  are  skipped  before  reading  the  data.  The  consume 
amount  specIBes  the  number  of  data  elements  wbidi  are 
removed  Cram  an  irqwt  after  tbe  node  executes.  A  valve  has 
some  integer  value;  if  this  value  is  zero,  no  data  is  placed  on 
tile  queue. 

2.2  Command  programs.  Unlike  ptocessii^gnphs.  command 
programs  are  implementation-dqpaident  on  tbe  PGM  host. 
Quqiter  5  of  the  PGM  qiedfication  qiedfies  the  Command 
Program  (CP)  functionality  required  for  a  PGM 
implementation  [2].  A  syntax  is  suggested,  but  it  is  not 
mandatory.  CP  procedures  alone  are  not  sufficient  to  write 
command  programs,  for  these  procedures  must  be 


inooipoimed  into  a  high-order  language  (HOL)  qiedfic  to  the 
target  machine.  Command  program  functions  permit  tbe  usg 
to  start  and  stop  tbe  graph,  to  enter  operator  qiedfied 
parameters,  to  link  and  imliiik  queues  and  grqih  variaides,  and 
to  flush  the  data  from  a  dynamic  queue  prior  to  restarting  the 
graph.  The  following  list  smnmarires  tbe  m^or  command 
program  functions: 

•  Declare  die  initial  CP  or  a  qMwned  CP  and 
its  formal  parameters. 

•  Spawn  one  or  more  instances  of  command 
programs  from  the  initial  CP  or  one  (ti  its  children. 

•  Start  or  stop  a  grqih. 

•  Cotnpletdy  reinitialize  a  graph  instance,  and 
resume  its  execution. 

•  Create  and  destroy  dynamic  queues  and  grqih 
variables. 

•  Mtialize  a  dynamic  queue  and  add  data  to  it,  read 
data  from  it,  or  flush  the  data  entirely. 

•  Read  and  write  to  dynamic  queues  and  grqib 
variables. 

•  C^ormectordiscoiineaaCPtothebead 

or  tail  of  a  dynamic  queue,  or  Unk  or  unlink  a 
dynamic  queue  from  the  port  of  a  grqih  or  I/O 
procedure. 

•  Waitfortbecoipletionofapixioeduretoreada 
queue. 

•  Initialize  I/O  procedures,  associate  them  and 

with  dynamic  queues,  and  start  and  stop  tbe  I/O 
processes. 

3.  The  Proccssiiig  Graph  SniportEnvironiiwiit  (PGSE) 

The  Processing  Grqib  Support  Environment  is  set  of  Ada 
programs  for  compiling,  linking,  and  executing  inocessing 
grqibs  and  cranmand  programs.  PGSE  was  written  and 
developed  with  tbe  Digital  Equipment  Corporation  Ada 
Compilation  System  (DEC  ACS),  and  has  been  ported  to  tbe 
SUN-4  under  tbe  Telesofl  TeleGen2  compiler.  PGSE 
iocarpocaiBS  a  conqtikr,  linker,  and  run-time  shell  (Rg.  1). 

Tbe  PGSE  cmnpiler  performs  extetisive  syntactical  error 
cbeddng  on  tbe  source  files,  as  well  as  verifying  the 
conformance  of  tbe  primitive  arguments  with  a  library 
primitive  inofiles.  The  linker  acc^ts  the  object  modules 
produced  by  tbe  comitiler,  and  produces  a  single  grqih 
realization  load  module.  Extensive  diecks  are  performed  to 
verify  tiiat  tbe  interfaces  between  graqibs  and  subgnqihs  are 
consistently  defined. 
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3.1  The  PGSE  Run-Jlme  Shell  Tbe  nm-tmie  sbeU  pennits 
the  user  to  drtug  and  execute  tfae  gnyh  realization  module 
created  during  tbe  link  $tq>.  Ibeexecodug  graph  can  read  data 
from  ASCn-foimatted  data  files,  process  tbe  data,  and 
produce  formatted  miqmt  files.  Tbe  PGSE  nm-time  shell 
includes  an  eztenrible  libraiy  of  rignal  processing  primitives 
compliant  with  the  ANAJYS-2  primitive  q>ecificari(m. 
PGSE  is  distributed  with  die  source  code  for  all  supported 
primidves,  and  a  set  (d  Ada  packages  to  asrist  die  user  in 
develofungnewprimidves  [11,  part  2,  chapter?]. 

During  graph  execudon,  tbe  shell  evaluates  run-time 
expresrioos  and  continually  checks  the  inteaeladonship  of 
tbeNEPs.  Tbe  constraints  indode  tbe  requirements  that  all 
NEPs  except  valves  are  greater  than  or  equal  to  zero;  anddiat 
thresbcdd  amounts  are  greater  than  or  equal  to  both  tbe 
consume  amount,  and  tbe  sum  of  the  read  and  oCEset  amounts. 
Primidve  qiedfic  constraints  are  likewise  checked  for  data 
oomphance.  Warnings  win  also  be  issued  if  any  giqih  system 
or  primitive-specific  constraint  is  violated. 

3.2  Shell  Commands.  Commands  to  the  sbeU  can  be  entered 
either  interacdvely,  or  by  writing  a  formal  Ada  command 
program.  Tbe  original  Ada  command  program  environment 
was  developed  in  1988;  more  recendy,  a  shell  has  been 
implemented  which  will  process  Ada  programs  written  in 
accordance  with  tbe  ANAJYS-2A  command  program 
environment  Many  of  tbe  interacdve  commands  emulate 
command  program  functions.  Commands  are  defined  to  create 
and  destroy  dynamic  queues  or  grqib  variables;  to  link  and 
unlink  tbe  queues  or  grqih  variables  fimn  a  grtqdi;  to  dqwat 
examine,  and  remove  data  firom  either  dyiuunic  and  local 
queues  or  gctpb  variables;  and  to  start  and  stop  I/O  processes. 
The  gnph  itself  can  be  started  and  halted,  and  tfae 

of  nodes  soqieaded  and  resumed. 

An  executing  gnqib  can  be  halted  by  suspending  node 
scheduling,  stopping  grqpb  I/O,  or  by  setting  a  btealqxnnt 
for  a  node  or  jnimidve.  Tbe  use  of  breakpoints  provides  a 
m^or  tool  for  symbolic  ddmgging  of  tbe  grqdt  Brealqx^ts 
can  be  set  to  execute  alter  some  qiedfied  number  of  node  or 
primidve  executions,  and  prior  to  tbe  next  execution  of  the 
enti^.  A  brealqxrint  will  be  either  teaqiotary  or  permanent; 
if  permanent,  it  must  be  explicitly  canceled  or  the  scheduler 
will  stt^  on  every  subsequent  occurrence  of  tbe  spedfied 
condition.  A  gnpb  can  also  be  executed  one  node  at  a  time. 


Tbe  trace  Junction  is  idraitical  to  the  breakpoint  function; 
however,  it  only  notifies  the  user  whoa  a  given  conditiao  has 
been  met,  rather  than  halting  graph  execution.  Watcl^ints 
can  be  set  to  suqieod  grqrii  execution  whenever  a  particular 
NEP  is  modified  outside  of  a  qiedfied  data  range.  Tbe 
watchprant  function  is  useful  for  analyzing  grqihs  with 
variaUeNEPs. 

When  tbe  graph  execution  is  suspended,  different  entities 
witiiin  die  graph  can  be  examined.  Tbe  user  can  display  the 
of  the  node,  and  tbe  status  of  each  ci  its  input  and 
ouqxitqueues.  All  brealqioints  can  be  listed.  Whoiaqueoeis 
examined,  tbe  data  it  contains  can  be  read  to  a  file,  listed  on 
tbe  screen,  or  itiotted  to  the  screen.  All  fiumal  parameters 
associated  with  nodes,  queues,  and  graph  varidiles  can  be 
displayed. 

Log  and  history  files  culture  tbe  history  of  the  gnpb 
execntitm.  Tbe  log  file  records  tfae  sequence  of  interactive 
commands  processed  by  tfae  shell,  and  the  history  file  saves 
tfae  sequence  of  node  executions,  including  the  path  to  tbe 
node  and  the  underlying  primitive. 

4.  A  Signal  Ptoccssiiig  Example 

Hg.  2  illustrates  an  elementary  four-node  juooessing  graph 
which  performs  filtering,  qiectral  analysis,  integratim,  and 
complex  magnitude  estimation.  This  representation  was 
generated  with  tbe  GEWS  workstation.  Tbe  nodes  are 
shadowed,  and  tbe  queues  ate  in  boldface:  this  convention,  in 
ctxnpliance  with  tbe  processing  grqdi  pictorial  standard  [5], 
indicates  that  eadi  oi  these  queue  and  node  entities  represents 
a/offlify  of  graph  instances.  Tbat  is,  there  will  be  NC  a^es 
of  tbe  grafdi  executing  in  parallel,  each  tm  a  sqiarate  sensor 
channrJ. 

In  tbe  case  of  tbe  queues,  die  family  structure  is  made  explicit 
by  tbe  bracketed  indices  which  precede  tbe  queue  names 
([1  JfClQJN,  [l.HQQl,  etc.).  Tbe  ith  node  or  queue  in  a 
family  will  be  designated  as  [I]ENTITY_NAME.  Tbe 
syntax  for  the  SPGN  specifications  is  also  straightforward. 
A  family  of  NC  queues  ofmodefloatcanbeq)ecifie<*as 

%QUEUE  (  [1..NC1Q_IN_FFT  :  FLOAT  ) 

A  node  which  calls  tbe  primitive  reaUto-complex  FFTcssi  be 
dedatedas 

%NODE  ( [Isl~NC]FFT 
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FIUBOTIVE  «  FFTJIC 
FIUMJN  -NJN, 

NJOOT. 

ILAG. 

BlfLNUM, 

noQjNjFr 

IHRESHOLD-N 
CONSUME- N*a*OL^ 

FiuM_ouT  -  [i]Q_ouTjnrr 

wbere  NJN.  N.OUT,  FLAG,  and  BIN  J4UMBER  are  Ibe 
number  of  points  par  transfonn,  die  nnmba  of  ouqiut  p(dnts 
per  transfixm.  the  dtrectioa  flag,  and  the  starting  Un  number, 
re^iecdvdy.  [I]Q_|NJFFT  and  [I]Q_OUT JFT  are  the 
ith  members  of  a  fionily  of  input  queues  and  ouqmt  queues, 
respecdvdy.  The  SPGN  for  die  exanqite  was  generated  on  the 
Madntosb  Grqduc  Entry  Worlcstation  (GEWS)  [12].  but  it 
can  also  be  wiitten  by  hand. 

The  ayiidication  can  be  started  and  managed  either  by  a 
sequence  of  interacdve  commands,  or  by  an  Ada  command 
program.  In  eitfaer  case,  tte  command  sequence  must  be: 

•  define  the  formal  graph  variables  and/or  queues 
connected  to  the  graph. 

•  define  the  I/O  processes  for  the  formal  queues 
and  assodate  the  I/O  processes  widi  the  iqxit 
and  ouqmt  data  files. 

•  start  all  I/O  processes  excqit  for  one  inputprocess. 

•  start  die  gnqih  via  the  START  command. 

•  start  the  remaining  input  process. 

Hgure  3  dmws  the  transfooned  data  on  the  input,  ouqmt,  and 
internal  queues  of  die  goqdi.  Tbedatawasaqmiredbysetdng 
bredqxnnts  on  the  iqiprcqiriate  nodes  and  plotting  the  queue 
data  to  the  screen.  The  input  to  the  graph  was  simulated 
dme-seiies  data  generated  by  summing  a  degraded  sinusoidal 
signal  and  three  of  its  harmonics.  A  pracdcal  difficulty  is 
diat  PGSE  cannot  examine  data  on  the  formal  queues.  The 
original  gnqih  was  therefore  augmented  by  replacing  the 
input  and  ouqmt  ports  by  a  family  of  concatenated  nodes  and 
rqiiicated  nodes,  respecdvdy. 

5.  The  Status  and  Future  of  PGSE 

CSWS,  availaible  fiom  die  Naval  Research  Laboratory  [13], 
has  been  widdy  (Ustributed,  although  it  is  no  longer  bdng 
maintained  against  the  current  PGM  specification.  The 
Navy-sanctioned  workstation  tools  for  AN/UYS-2 


^qilication  developmmit  are  the  Sun-based  gred/grail 
utilities,  devdoped  by  AT&T  Bdl  Laboratories  [14].  Gredis 
die  iconic  aqtttre  systena,  and  grdl  translates  the  gred  ooqNtt 
files  into  SFGN. 

PGSE,  like  GEWS.  has  been  widely  distributed,  both  to 
industiial  users  and  to  Navy  laboratories  [13].  Askfe  frmn 
the  AN/UYS-2,  the  VAX-  and  SUN-based  PGSE  system 
currendy  provides  the  only  availatde  merms  for  devekqdng 
PGM  appUcatkms  and  prhnidves.  New  primitives  can  be 
inexpensivdy  ptoio^ped  and  tested  in  Ada.  PGSE  is  rtf 
particolar  value  to  teams  of  softwve  devdopers  with 
limited  access  to  the  AN/UYS-2,  which  is  efiecdvdy  a  single 
user  machine,  although  the  qieed  of  PGSE  grqih  execmioo  is 
generally  insuffideat  fix  real-time  applications.  PGSE  is 
now  bdrig  used  by  Hughes  Aircraft  to  prototype  qjfdicadoos 
for  the  Active  Low-I¥eqneacy  Sonar  (ALPS).  It  has  been 
eoqiloyed  at  NRL  and  NAWC  to  fxototype  passive  sormr 
tqiplicatioos. 

NRL  is  currendy  testirtg  a  new  release  of  PGSE  whidi 
incorporates  an  AN/UYS-2A  Ada  command  program  shell 
[IS].  A  new  interface  wiU  permit  users  to  execute  subgraphs 
as  single  nodes;  this  pmnits  the  user  to  simulate  compound 
primidves  Cchains*)  constructed  fiom  subgrqdis  with  nodes 
using  existing  primidves.  New  tools  to  simplify  die  coding 
of  Ada  primidves  have  been  proposed. 

The  devdopment  of  PGSE  has  been  driven  by  the  Navy's  need 
for  nqnd  prototyping  of  AN/UYS-2  qiplicadons.  Because 
the  PGSE  implementadon  is  indqiendent  of  any  qiecific 
target  murJitw.,  the  system  has  wide  utility.  The  domain  of 
qqilicadons  is  a  funcdon  tmly  of  the  primidve  libraries. 
Although  the  current  prinudve  library  is  based  upon  the 
AN/UYS-2  inimidve  spedficadoo  [Ifi],  primidves  for  C3I 
qqilicadons  and  Bayesian  inference  networks  have  also  been 
written. 
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Abstract 

The  need  to  establish  a  coherently  inte^ated 
methodology  for  engineering  complex  real-time 
computer  systems  has  been  taken  seriously  in 
recent  years  by  both  researchers  and  practitioners 
dealing  with  safety-critical  computer-based 
applications.  In  our  view,  a  missing  cornerstone 
for  developing  such  an  engineering  methodology 
is  a  system  (and  component)  model  which  is 
effective  not  only  for  abstraction  and  stepwise 
refinement  of  complex  real-time  computer 
systems  but  also  for  representing  and  providing  a 
basis  for  analysis  of  the  application  environ¬ 
ments.  Only  with  the  establishment  of  such  a 
modeling  scheme  can  one  hope  to  realize  a 
method  for  requirements  analysis  and  specifica¬ 
tion  which  will  produce  a  solid  information  base 
for  both  systematic  design  and  rigorous  evalua¬ 
tion.  The  first  author  and  his  research  collabora¬ 
tor,  Hermann  Kopetz,  jointly  formulated  an  initial 
framework  of  such  a  desirable  system  model  a 
few  years  ago.  It  was  named  the  real-time  object 
(RT-object)  model.  The  RT-obJect  model,  an 
extension  of  the  conventional  object  model, 
possesses  capabilities  for  precise  handling  of  the 
timing  behavior  of  modeled  subjects.  A  basic 
thesis  we  have  attempted  to  validate  is  that  any 
application  environment,  not  only  control 
computer  systems,  can  be  modeled  as  a  set  of 
interacting  RT-objects.  The  potential  of  the  RT- 
object  model  as  a  backbone  of  an  integrated 
methodology  for  engineering  complex 
dependable  systems  is  discussed. 

1.  Introduction 

Techniques  for  engineering  complex 
dependable  real-time  systems  (DRS's)  have  bt  'n 
a  subject  of  intensive  research  for  more  than  two 
decades  but  by  and  large  they  have  evolved  in 
insufficiently  integrated  forms  up  to  now.  As  a 
consequence,  we  perceive  that  the  following 
conditions  in  the  current  practice  in  system 
engineering  exist : 

(1)  Lack  of  rigor  in  requirements  specification; 

Particularly  problematic  is  the  specification 


of  temporal  behavior  requirements  and 
dependability  requirements.  As  a  result,  the 
requirements-design  traceability  has  been 
generally  weak. 

(2)  Weak  traceability  among  various  system 
models: 

The  traceability  among  various  system 
models  used  during  high-level  design,  validation, 
and  evaluation  has  been  generally  weak.  The 
consequence  has  been  the  poor  interoperability 
and  coherence  among  various  tools  mobilized 
during  system  development  and  evaluation. 

(3)  Lack  of  integration  in  design  techniques; 

Most  real-time  fault  tolerance  techniques 

stand  unnecessarily  in  isolation  while  in  fact 
many  of  them  are  complementary  in  nature. 
Cost-effective  integration  of  such  techniques 
remains  unachieved.  Also,  approaches  have  not 
been  sufficiently  developed  for  optimizing 
allocation  of  resources  which  would  realize 
efficient  implementation  of  real-time  distributed 
computer  systems  (DCS’s).  Naturally,  integration 
of  effective  resource  allocation  approaches  with 
fault  tolerance  techniques  has  been  delayed.  ^ 

In  our  view,  the  key  issue  to  be  resolved  is 
the  uniformity  and  achievable  accuracy  in 
representation  of  both  application  environments 
and  designs  at  different  levels  evolving  during  the 
system  development  cycle.  That  is,  a  desired 
representation  (or  modeling)  scheme  should  be 
effective  not  only  in  the  abstraction  of  real-time 
(computer)  control  systems  under  design  but  also 
in  the  representation  of  the  application 
environments.  Such  a  modeling  scheme  should 
allow  variable-accuracy  representations  ranging 
from  a  full-detail  functional  specification  to  a 
high-level  structural  representation.  In  particular, 
it  should  be  capable  of  handling  temporal 
characteristics  at  various  degrees  of  accuracy. 
Only  with  the  establishment  of  such  a  modeling 
scheme  can  one  hope  to  realize  a  method  for 
requirements  analysis  and  specification  which 
will  produce  a  solid  information  base  for  both 
systematic  design  and  rigorous  evaluation. 

The  first  author  and  his  research 
collaborator,  Hermann  Kopetz,  jointly  formulated 
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an  initial  framework  of  such  a  desirable  modeling 
scheme  in  [Kop90a,  Kop90b].  It  was  named  the 
real-time  object  (RT-ob|ectl  model.  TheRT- 
object  model,  an  extension  of  the  conventional 
object  model,  possesses  capabilities  for  precise 
handling  of  the  timing  behavior  of  modeled 
subjects.  Althou^  the  term  "real-time  object" 
has  appeared  in  literature  quite  a  few  times  and 
there  is  some  common  ground  with  what  was 
referred  to  as  a  real-time  object  in  previous 
literature  [Bih89,  Rum91],  the  real-time  object 
model  discussed  in  this  paper  possesses  some 
concrete  unique  characteristics.  In  order  to 
distinguish  the  real-time  object  model  discussed 
here  and  in  [Kop90a,  Kop90b]  from  others,  it  is 
often  denoted  by  RTO.k. 

Besides  the  timeliness  of  the  output,  an 
attribute  that  is  of  particular  importance  in 
complex  real-time  ^sterns  is  dependability  which 
encompasses  reliability,  availability,  and  security 
lLap92].  The  current  practice  in  handling 
dependability  during  the  system  engineering 
cycle  lacks  a  coherent  systematic  process.  The 
need  for  establishing  an  integrated  methodology 
is  especially  acute  in  this  area.  Again,  we  ar^ue 
in  this  paper  that  the  RTO.k  model  is  a  promising 
framework  for  building  up  such  a  methodology. 

Therefore,  the  main  theses  put  forth  in  this 
paper  are : 

(1)  The  RTO.k  object  model  has  strong 
uniformiw  and  unbound  achievable  accuracy  in 
representing  both  application  environments  and 
computer  control  systems  under  design,  and 

(2)  The  RTO.k  object  model  is  a  promising 
backbone  of  a  coherently  integrated  system 
development  methodolo^,  in  particular,  a 
framework  in  which  various  complementary 
dependability  assurance  techniques  can  be 
integrated. 

2.  The  RT-object  model 
2.1  Motivjation 

In  our  previous  study  dealing  with  the 
subject  of  DRS's,  the  need  for  a  model  such  as  the 
RTO.k  object  mt^el  was  encountered  in  several 
different  contexts.  First,  during  the  attempts  to 
validate  rigorously  the  effectiveness  of  certain 
real-time  rault  tolerance  mechanisms  in  the 
context  of  real-time  DCS's,  questions  such  as  the 
following  were  encountered: 

How  are  time-out  values  to  be  chosen  ? 

How  are  the  recovery  time  requirements  to 
be  determined  ? 

Secondly,  in  trying  to  extend  the  approaches 
formulated  for  scheduling  hard-real-time  tasks  in 
single-node  (including  multiprocessor  based) 


computer  systems  [KimSO]  into  the  approaches 
applicable  to  DCS's,  the  following  questions  were 
encountered: 

How  is  the  urgency  of  a  task  to  be 
determined  ? 

What  is  the  relationship  between  stimulus- 
to-response  deadlines  and  urgencies  of 
tasks  ? 

How  are  the  concurrency  and  contention  for 
processing,  storage,  and  communication 
resources  to  be  reflected  in  determining 
an  optimal  allocation  ? 

How  are  dynamic  system  reconfiguration 
actions  to  be  accommodated  without 
endangering  the  output  timeliness  ? 

It  appeared  that  the  approaches  formulated  for 
single-node  systems  were  too  simplistic  to  be  of 
use  as  a  baseline  for  deriving  approaches 
applicable  to  sizable  real-time  DCS's. 

Thirdly,  in  searching  for  approaches  for  the 
rigorous  specification  of  timing  requirements,  the 
following  questions  were  encountered: 

Would  it  be  sufficient  to  associate  timing 
specifications  with  procedure-segments 
and  messages  crossing  node  boundaries  ? 
How  are  fault  tolerance  actions  to  be  accommo¬ 
dated  in  timing  specifications  ? 

Our  conclusion  was  that  all  the  above  questions 
pointed  to  one  basic  law:  rigorous  handling  of 
the  temporal  behavior  of  a  DCS  requires  a  global 
view  of  the  system  and  a  top-down  engineering. 
Consequently,  a  model  of  a  system  or  its 
components  that  facilitates  a  global  view  and 
stepwise  refinement  of  the  view  was  recognized 
as  the  key  item  needed  in  resolving  the  questions 
raised  above.  So,  a  search  began  and  at  the  outset 
the  object  model  appeared  to  be  a  natural 
building-block  for  any  modular-structure  system 
[Dah72,  Boo91,  Rum91].  However,  conventional 
object  models  did  not  appear  to  have  sufficient 
concrete  mechanisms  that  provide  power  for 
accurate  modeling  of  application  environments. 

2.2  The  essence  of  the  RT-object  model, 

RTO.k 

The  RT-object  model  formulated  by  Kopetz 
and  Kim  in  [Kop90^  Kop90b],  often  denoted  by 
RTO.k,  is  an  extension  of  the  conventional  object 
model(s).  As  an  object  model  it  is  : 

(1)  independent  of  the  language  (textual  or 
graphic)  used  to  program  or  specify  object 
desi^s, 

(2)  independent  of  the  way  inheritance  is 
facilitated,  and 

(3)  independent  of  the  service-call  mechanisms 
or  the  message  protocols  by  which  objects 
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exchange  infonnation  on  services  requested  and 
completed. 

It  is  a  stepwise  refinable  structure  with 
abstract  data  type  characteristics.  It  is  an 
extension  of  the  conventional  object  model(s)  in 
at  least  diree  essential  ways; 

(1)  For  each  execution  or  a  method  of  an  RTO.k 
object,  a  deadline  is  imposed; 

(2)  For  some  methods  of  an  RTO.k  object  a  rer  !- 
time  clock  serves  as  the  mechanism  for  triggering 
the  method  executions  as  a  function  of  real  time 
and  such  methods  are  called  time-triggered  (TT-) 
methods',  and 

(3)  Real-time  data  contained  in  an  RTO.k  object 
become  invalid  after  the  interval  called  the 
maximum  validity  duration  passes. 

In  addition,  the  following  constraint  on  method 
executions  is  incorporated  into  the  RTO.k  model. 

(4)  A  basic  concurrency  constraint  which 
prevents  conflicts  between  TT-methods  and 
message-triggered  methods  is  incorporated.  In 
general,  activations  of  object  methods  triggered 
by  messages  from  external  clients  are  allowed 
only  when  TT-method  executions  are  not  in 
place.  To  be  exact,  when  a  message-triggered 
method  is  not  free  of  data  conflict  with  a  TT- 
method,  execution  of  the  former  (message- 
triggered)  method  must  not  be  allowed  in  a  time 
zone  separated  by  less  than  a  certain  system- 
dependent  constant,  say  g,  from  a  TT-execution 
of  the  latter  method.  This  restriction  is  called  the 
basic  concurrency  constraint  or  state  accessibility 
constraint.  Note  that  this  basic  constraint  does 
not  impose  any  restriction  on  concurrent 
execution  of  TT-methods  or  concurrent  execution 
of  message-triggered  methods. 

Figure  1  depicts  the  essential  components  of 
the  RTO.k  object  model.  The  extension  (1), 
namely  the  deadline  imposed  on  a  method 
execution,  has  been  mentioned  in  every 
discussion  of  the  term  real-time  object.  In 
addition,  the  notion  of  calling  for  an  object 
method  as  real-time  reaches  some  predetermined 
time-points  was  mentioned  in  some  of  the 
literature  in  which  the  term  real-time  object  was 
used,  although  the  approach  of  clearly 
distinguishing  between  TT-methods  and  message- 
trigged  methods  adopted  in  the  extension  (2)  is 
probably  a  unique  feature  of  the  RTO.k  model. 
The  extension  (3)  (use  of  the  maximum  validity 
duration  to  eliminate  old  useless  data)  and  the 
extension  (4)  (basic  concurrency  constraint)  are 
unique  features  of  the  RTO.k  model.  Therefore, 
on  one  hand,  the  RTO.k  object  model  is  general 
in  that  it  is  independent  of  the  specification  / 
design  language,  the  inheritance  mechanism,  and 
the  inter-object  message  protocols  used.  On  the 
other  hand,  it  has  a  new  concrete  structure  in  that 
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Figure  1 .  RTO.k  -  A  real-time  object  model 

it  contains  deadlines,  TT-methods,  maximum 
validity  duration,  and  the  basic  concurrency 
constraint.  The  unique  characteristics  of  the 
RTO.k  object  model  together  with  its  relatively 
recent  formulation  are  also  indications  that  the 
model  as  it  currently  stands  is  more  of  a 
framework  of  an  evolving  model,  not  a  fully 
developed  mature  tool. 

The  proposition  that  an  object  model  can  be 
used  to  represent  not  only  computer  systems  but 
also  application  environments  is  not  new 
[Dah72].  This  is  originated  from  the  belief  that 
each  real-time  application  environment  can  be 
viewed  as  a  set  of  state  variables  that  interact 
among  themselves.  However,  unlike  other  object 
models  the  TT-method  facility  in  the  RTO.k 
object  model  enables  representation  of  the 
application  environments  "to  any  degree  of 
accuracy  desired"  within  the  limit  of  the  user's 
knowledge  about  the  environment.  That  is,  the 
clock-driven  activation  of  object  methods  is  a 
natural  mechanism  for  representing  the 
concurrent  and  continuous  changing  of  the  state 
variables  which  are  typical  situations  in  the 
application  environments.  This  will  be  illustrated 
in  the  next  section.  Additional  mechanisms  for 
specifying  certain  parallelism  existing  among  the 
methods  of  RTO.k  objects,  e.g.,  mechanisms  of 
COBEGIN  or  FORK-JOIN  nature,  are  also 
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allowed  in  the  RTO.k  object  model  as  in  many 
conventional  object  models. 

The  basic  concurrency  constraint  is  not  only 
necessary  for  keeping  the  semantics  of  the  RTO.k 
model  in  an  unambiguous  form  but  also  can  be 
useful  in  ensuring  consistency  among  RT-objects. 
The  latter  point  is  particularly  acute  when  RT- 
objects  are  replicated  while  updating  the  object 
data  store  in  each  object  is  the  exclusive 
responsibility  of  TT-methods.  It  is  easy  to  design 
such  object  replicas  with  the  assurance  that  they 
will  never  provide  different  values  with  the  same 
time-stamp  to  their  clients  [Kop90b]. 

A  fundamental  notion  established  in 
association  with  the  RTO.k  object  model  is  that 
of  temporal  accuracy  which  refers  to  the  time  gap 
between  a  state  variable  in  the  application 
environment  and  its  representation  in  a  computer- 
internal  RT-object  ^Kop90a,  Kop90b].  This 
temporal  accuracy  is  a  notion  fundamentally 
related  to  the  output  timeliness.  Therefore,  we 
believe  that  much  of  the  temporal  behavior 
requirements  imposed  on  control  computer 
systems  can  be  stifled  in  natural  and  rigorous 
forms  in  terms  of  temporal  accuracy  bounds. 
Temporal  accuracy  bounds  are  then  major  drivers 
for  determining  many  other  temporal  entities  such 
as  deadlines  for  object  method  executions. 

3.  An  RT-object  based  approach  to 
environment  modeling  and 
requirements  specification 

The  object  model  w’as  initially  formulated 
and  used  in  simulation  applications  [Dah72]. 
Therefore,  use  of  the  object  model  in  modeling 
tile  environment  objects,  i.e.,  modular  entities  in 
the  environment  which  have  time-vaiy'ing  internal 
states,  has  been  practiced  for  a  long  time. 
However,  as  mentioned  in  the  preceding  section, 
the  recently  formulated  version  of  the  object 
model,  the  RTO.k  object  model,  supports  more 
accurate  detailed  modeling  of  environment 
objects.  This  aspect  and  how  the  RTO.k  object 
model  can  be  used  during  the  requirements 
specification  and  high-level  design  steps  are 
discussed  in  this  section  with  examples  derived 
from  the  defense  area. 

Consider  an  anti-missile  defense  scenario 
depicted  in  Figure  2.  The  environment  in  this 
context  means  a  sky  space  segment  of  interest, 
called  the  "theater",  and  any  moving  objects  in 
that  theater  including  a  valuable  target  to  be 
defended  (e.g.,  a  ship)  and  flying  objects  (e.g., 
hostile  reentry  vehicles  (RV's)  and  non¬ 
threatening  slow-moving  objects). 


Figure  2.  An  anti-missile  defense  system 

Initially  the  top-level  requirements  given  by 
the  customer  who  places  an  order  for  the  defense 
system  are  as  follows. 

(1)  Each  RV  should  be  intercepted  if  it  is 
dangerous. 

(2)  If  there  are  more  dangerous  RV's  than 
the  interceptors,  then  the  early  arriving  RV's 
should  be  intercepted  and  the  defense  target 
should  be  moved  toward  a  safer  location. 

The  system  designer  will  first  decide  on  the  set  of 
sensors  and  the  set  of  actuators  (e.g.,  interceptors) 
to  be  deployed.  Thereafter,  the  functions  of  the 
computer  based  control  system  will  be 
determined  based  on  the  control  theory  logic 
adopted. 

If  some  sensors  and  actuators  chosen  are 
located  in  the  theater,  then  a  representation  of  the 
application  environment  must  be  expanded  to 
include  these  newly  chosen  environment  objects. 
An  RTO.k  representation  of  the  initial  tiieater 
(before  deciding  on  the  sensors  and  actuators  to 
be  incorporated  within  the  theater)  is  depicted  in 
Figure  3. 

The  internal  storage  of  this  high-level  RT- 
object  basically  consists  of  the  space  in  the 
theater,  a  defense  target  (ship),  and  a  dynamically 
varying  number  of  RV's.  Therefore,  the 
information  kept  in  this  RT-object  is  a 
composition  of  the  information  kept  in  the 
defense  target,  the  RV's,  and  any  other  object  in 
the  theater  space.  A  noteworthy  propei^  here  is 
that  each  of  these  components  lAat  are  treated  as 
components  of  the  internal  storage,  i.e.,  the 
defense  target  and  the  RV's  themselves,  can  in 
turn  be  represented  as  an  RT-object. 
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state  Space 


-  internal  storage 

•  (O-n)RVs 

•  Defense  target 

•  Theater  soace 

in  turn, 

RTOs 

e  Max  valid, 
duration  =  0 

-  Access  rights  for  external  objects 
•  None 

Methods 


-  Spontaneous:  (driven  by  an  infinite-preciaion  clock) 

•  RV  position  S  acceteration _  •  TT- 

•  Target  position  &  acceleration  invocation 


•  Service: 


•  Read-location  (RVs  &  target) 

a  Instant 

service  for  sensors 

service 

•  Set-direction  &  accefn  (target) 

Conditions 


•  Velocity  &  acceleration  of  each  RV:  - 

•  Velocity  &  acceleration  of  defense  target:  - 

•  #  of  RVs  at  any  given  time:  - 

•  Read-location(target)  may  return  a  constant  value 

•  3  parallelism  among  service  methods. 


Figure  3.  An  RTO.k  specification  of  the 
environment 

The  methods  in  this  top-level  RT-object 
model  for  the  theater  are  classified  into  two  types. 
One  class  of  methods  are  the  TT-methods  and 
they  are  labeled  spontaneous  methods  in  Figure  3. 
Conceptually  these  methods  are  activated 
continuously  and  each  of  their  executions  is 
completed  instantly.  If  we  adopt  the  less  precise 
version  of  the  model  in  which  the  time  domain  is 
a  discrete  domain  and  the  time  gap  between  two 
instants  adjacent.in  the  domain  is  called  a  clock 
tick,  then  a  less  accurate  representation  of  the 
environment  results.  That  is,  such  view  dictates 
the  activation  frequency  of  any  spontaneous 
method  to  be  no  more  Aan  once  per  every  clock 
tick  while  allowing  each  execution  to  be 
completed  before  or  by  the  time  of  the  following 
activation  of  the  same  method.  Therefore,  the 
spontaneous  methods  are  the  mechanisms  for 
representing  (or  simulating)  continuous  state 
changes  that  occur  naturally  in  the  environment 
objects.  The  natural  parallelism  that  exists 
among  the  environment  objects  is  precisely 
represented  by  use  of  multiple  TT-methods  which 
may  be  activated  simultaneously.  In  general,  the 
accuracy  of  an  RT-object  representation  of  the 
environment  is  a  direct  function  of  the  activation 
frequencies  of  TT-methods. 


The  other  class  of  methods  are  the  message- 
tri^ered  methods  that  are  found  in  conventional 
objects.  They  are  labeled  service  methods  in 
Figure  3.  They  are  invoked  upon  requests  from 
the  clients.  However,  the  clients  here  are  unusual 
ones,  i.e.,  sensors  and  actuators  interfacing  with 
environment  objects.  Figure  3  corresponds  to  the 
case  where  some  sensors  to  be  located  outside  the 
theater  have  been  chosen  but  no  sensors  and 
actuators  to  be  located  inside  the  theater  have  ^et 
been  chosen.  Sensors  work  to  obtain  information 
about  the  states  of  environment  objects,  primarily 
locations,  movement  directions,  and  some 
simatures  of  RVs  and  the  defense  target.  This 
relationship  between  sensors  and  environment 
objects  can  be  represented  partially  hy  the  service 
methods  such  as  "Read-location  (environment 
object)”  in  Figure  3,  with  the  understanding  that 
such  a  method  is  executed  at  the  instant  at  which 
a  sensor  makes  an  observation  of  the  environment 
state. 

Similarly,  actuators  work  to  make  impact  on 
the  conditions  and  future  courses  of  environment 
objects.  In  this  example,  the  only  possible  impact 
that  can  be  made  on  RVs  by  the  system  being 
designed  is  the  collision  of  the  interceptors 
against  the  hostile  RVs.  These  interceptors  are 
actuators  produced  at  an  early  stage  of  the  system 
design  and  once  they  are  produced,  they  should 
be  treated  as  environment  objects  in  the  theater  as 
well.  Although  Figure  3  represents  the  theater 
before  introduction  of  such  actuators,  there  ^e 
some  control  points  in  the  defense  target  which 
can  be  accessed  from  the  computer  system, 
typically  structured  as  a  control  computer 
network  (CCN),  via  a  communication  channel, 
e.g.,  a  radio  communication  device.  The  "Set- 
direction  &  acceleration  (target)"  service  method 
in  Figure  3  represents  such  possibility. 

The  last  important  component  of  an  object 
model  is  the  set  of  constraints  that  governs  both 
information  states  of  the  object  and  the 
computational  results  produced  by  object 
methods.  In  Figure  3,  these  constraints  are  listed 
in  the  section  labeled  Conditions.  All  the 
constraints  listed  are  of  the  "laws  of  physics" 
tjpe.  Therefore,  the  system  designer's  knowledge 
of  physics  is  expressed  in  the  Conditions  section. 
One  of  the  constraints  in  Figure  3  specifies  the 
parallelism  that  exists  among  service  methods, 
i.e.,  the  parallelism  among  the  sensor  activities 
and  actuator  activities.  This  is  an  example  of  a 
concurrency  constraint.  Concurrency  constraints 
are  integral  components  of  RT-objects. 

As  mentioned  earlier,  a  single  RT-object 
specification  of  the  environment  can  be  refined 
into  a  network  of  RT-object  specifications,  each 
corresponding  to  a  different  environment  object. 
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All  die  knowledge  contained  in  the  Conditions 
section  of  the  single  RT-object  specification 
should  be  retained,  most  likely  in  a  scattered 
form,  in  the  network  of  environment  object 
specifications.  Additional  knowledge  may  also 
be  introduced  during  the  refinement  process. 

Note:  The  defense  scenario  discussed 

above  was  chosen  because  it  had  been  established 
in  a  LAN  based  distributed  computer  system 
testbed  in  the  authors'  laboratory  several  years 
ago.  It  is  a  rich  scenario  for  use  as  a  test  ground 
for  various  advanced  approaches  to  dependable 
real-time  computing.  The  existing  implementa¬ 
tion  is  structured  in  the  conventional  object 
oriented  style  and  not  based  on  the  RTO.k  object 
structuring.  A  new  implementation  based  on  the 
RTO.k  object  structuring  is  under  way. 


4.  The  RT-object  model  as  a  potential 
backbone  of  an  integrated  methodology 
for  engineering  complex  DRS's 

There  are  many  important  implications  of 
the  RT-object  based  unified  representation  of 
both  control  computer  systems  and  the 
applications  environments.  The  RT-object  model 
is  a  potential  basic  structure  that  can  evolve. 


Control  System 


ENV'T 


Mom  control  Iheoiy 
knoMedge  h  orrbedded 


Mom  lorowtodgo 
onphyilet 


Figure  4.  High-level  design 
via  elaboration  of  RT-objects 


rather  than  being  present  temporarily,  from  the 
requirements  specification  step  through  the 
structured  design  step  to  the  validation  and 
evaluation  step. 

Fimre  4  depicts  the  use  of  the  RT-object 
model  during  the  early  steps  in  the  system 
development  cycle.  Some  time  after  an  RT- 
object  based  specification  of  the  environment  is 
obtained,  an  RT-object  based  specification  of 
sensors,  actuators,  and  the  control  computer 
network  (CCN)  can  be  obtained  on  Ae  basis  of 
the  control  theory  adopted.  The  Conditions 
section  of  the  RT-object  representing  the  CCN 
must  contain  information  on  the  requirements 
imposed  by  the  system  customer. 

For  example,  if  a  radar  sensor  and 
interceptors  as  actuators  are  adopted,  then  the 
Conditions  section  of  the  RT-object  specifying 
the  CCN  must  include  the  type  of  constraints 
shown  in  Figure  S. 

All  the  specifications  (of  both  the 
environment  and  the  CCN)  may  go  throu^ 
further  refinement  which  is  essentially  a  high- 
level  desi^  activity.  As  the  environment 
specification  is  refined,  more  knowledge  of 
physics  may  be  incorporated.  Similarly,  as  the 
CCN  specification  is  refined,  more  control  theory 
knowledge  may  be  incorporated. 

So  far,  the  discussion  has  been  focused 
mostly  on  the  role  *>at  the  RT-object  model  can 
play  in  the  requirements  specification  and  high- 
level  design  steps  which  are  the  earliest  steps  in 
the  system  engineering  cycle.  Under  the  RT- 
object  based  engineenng  methodolop'  envisioned 
here,  detailed  design  means  essentially  a  process 
of  converting  the  high-level  RT-object 
specifications  into  more  detailed  or  even  fully 
executable  RT-object  specifications. 

A  number  of  significant  benefits  are 
expected  to  accrue  from  such  a  practice.  A  few 
major  ones  are  listed  below.  However,  concrete 
demonstrations  of  these  benefits  are  yet  to  be 
seen  and  thus  the  following  should  be  treated  as 
potential  benefits. 

(1)  Requirements  specification: 

Requirements,  in  particular  temporal 
behavior  requirements  and  dependability 
requirements,  can  be  specified  in  rigorous  forms 
that  can  be  detailed  to  varying  degrees.  As 
mentioned  before,  these  requirements  are 
expressed  in  die  Conditions  section  of  the  RT- 
object  specifications.  Uniformity  and  unbound 
achievable  accuracy  in  representing  both  the 
environment  and  the  CCN  are  the  rondamental 
ingredients  enabling  rigorous  specification  of 
requirements.  Also,  better  information  flow  from 
the  requirements  specification  step  to  the 
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state  Spaca 


Internal  Storage : 

•  Sensor  output  file 

•  Track  file 

•  Intercept  plan  file 

•  Sensor  schedule 

•  Target  status 

Max  validity 
duration  7? 

Access  rights  for  external  objects : 

•  RTO  for  Theater  (including  sensors 
and  actuators) 

Methods 


'  Spontaneous: 


Sensor  date  acquisition  (option  1) 


•  Processing  1 


•  Processing  n 


e  Sensor  commands 


•  Actuator  commands 


e  Deadline  7 


-Service: 

•  Start _ 

•  Stop _ 

e  Move-target _ ■ 

•  Process-data-from-intelligent-sensor  (option  2) 


Conditions: 

•  Sensor  data  acquisition  &  sensor  commands  must 
take  place  to  track  RVs. 

•  Once  'sensor  data  acquisHion*  finds  a  radar  return 
date  timestemped  at  t,  a  sensor  command  must  be 
issued  at  t  >  A. 

•  An  Intercept  command  must  be  issued  before  a 
dangerous  RV  reaches  the  altitude  of  x. 

•  Target  movement  commands  must  be  issued  to 
make  the  defense  target  to  attempt  to  avoid  dangerous 
RVs  even  after  the  interceptors  are  exhausted. 


Fieure  5.  An  RTO.k  soecification  of  the  CCN 


converting  each  high-level  RT-object  into  a 
network  of  software  (or  user)  objects  and 
hardware  (or  resource)  objects. 

(3)  Rigorous  validation: 

Validation  of  both  temporal  behavior  and 
dependability  can  be  performed  in  a  much  more 
systematic  and  rigorous  manner  than  before, 
tliis  is  again  due  to  the  improved  rigor  in 
^uirements  specification  and  the  improved 
information  flow  during  the  system 
development  cycle. 

(4)  Compatibility  among  modeling  and 
evaluation  tools: 

Due  to  the  improved  information  flow 
from  the  requirements  specification  step  to  the 
validation  and  evaluation  step,  the  compatibility 
among  the  tools  mobilized  during  various  steps 
is  expected  to  improve  significantly. 

It  should  also  be  noted  that  an  RT-object 
based  system  engineering  methodology  can  be 
gainfully  enhanced  by  incorporating  additional 
modeling  and  analysis  techniques  (e.g.,  process 
oriented  models  and  data  flow  models  [Rum91, 
You89))  as  supplements  to  the  RT-object 
backbone  during  various  steps  of  the  system 
engineering  cycle. 

5.  Integration  of  dependability  design 
techniques  into  an  RT>object  based 

system  engineering  methodology 

Quite  a  few  complementary  techniques  for 
designing  highly  fault-tolerant  real-time  DCS's 
have  been  developed  over  the  years,  but  each 
has  matured  to  a  different  degree  in  isolation 
without  being  integrated  with  others.  Two  most 
important  basic  classes  of  techniques  are  as 
follows. 

( 1 )  Techniques  for  realizing  action-level  fault 
tolerance : 


structured  design  and  optimization  step  can  result 
due  to  the  common  RT-object  structure  that  is 
used  in  both  steps. 

(2)  Resource  allocation: 

Top-down  multi-level  resource  allocation 
can  be  facilitated  in  the  course  of  stepwise 
creation  of  object  hierarchy  designs  after  starting 
with  RT-object  based  requirement  specifications. 
Such  resource  allocation  approaches  are  expected 
to  offer  significant  advantages  over  the  bottom-up 
approaches  of  extrapolating  conventional 
uniprocessor  scheduling  techniques  to  deal  with 
DCS's.  Obviously,  resources  can  also  be 
represented  at  varying  degrees  of  accuracy  as  RT- 
objects  before  their  allocation.  Resource 
allocation  can  then  become  a  process  of 


These  techniques  aim  for  desiring  DCS's 
such  that  critical  actions  take  place  (i.e.,  each 
critical  real-time  task  produces  an  output  as 
specified)  in  spite  of  component  failures. 
Therefore,  th^  aim  for  much  higher  degree  of 
dependability  than  those  techniques  aimed  for 
merely  aborting  some  tasks  and  cleansing  system 
states  upon  component  failures.  For  tolerating 
hardware  faults  only,  the  most  basic  techniques 
formulated  are : 

Voting  TMR  (triple  modular  redundancy) 
(more  eenerally,  voting  NMR)  [Toy87], 
PSP  (pair  of  self-checking  processors)  (of 
which  a  special  case  is  the  pair-of- 
comparing-pairs  scheme  used  in  the 


62 


Stratus  system  [Wil85])  [Kim92], 
Temporary  blackout  handling  [Kim92]. 

For  tolerating  both  hardware  and  software  faults, 
the  most  basic  techniques  are  : 

DRB  (distributed  recovery  block)  (which 
uses  the  recovery  block  scheme  as  a 
component)  [Hec91,  Kim89a,  Kim92], 
NVP  0^- version  programming)  [Avi85]. 

More  "expensive"  techniques  devised  to 
supplement  the  aforementioned  basic  techniques 
include  the  distributed  conversation  scheme  and 
others  [Kim89b].  ■ 

(2)  Techniques  for  DCS  diagnosis  and 
reconfiguration 

These  techniques  aim  for  minimizing  the 
periods  during  which  sick  organs  are  lurking  in 
DCS’s.  This  means  to  facilitate  fast  learning  by 
each  hiult-free  node  of  faults  occurring  in  other 
parts  of  the  DCS  and  fast  reconfiguration 
including  functional  amputation  of  faulty 
components,  reincorporation  of  repaired  or  new 
components,  and  redistribution  of  tasks. 

Basically  the  following  three  types  of  approaches 
are  conceivable: 

Centralized, 

Decentralized, 

Hybrid. 

Centralized  approaches  are  simple  and  have  been 
considered  from  the  beginning  days  of  distributed 
computing.  Yet  its  integration  with  the 
techniques  for  action-level  fault  tolerance  has  not 
been  fully  accomplished.  Decentralized 
approaches  are  much  less  mature  as  a  technology 
although  again  the  basic  concept  is  at  least  20 
years  old.  Hybrid  approaches  can  be  developed 
in  rigorous  forms  only  after  decentralized 
approaches  are  well  understood.  ■ 

In  intej^ting  the  aforementioned  techniques 
and  others  for  desiring  fault-tolerant  dependable 
computing  capabilities,  the  following  ma^or  steps 
need  to  be  accomplished.  Our  basic  thesis  here 
again  is  that  the  RT-object  model  provides  a 
natural  framework  for  specifying  dependability 
requirements  at  varying  levels  of  detail  and  also 
for  inte^ting  various  fault  tolerance 
mechanisms  /  techniques  that  have  evolved  in 
isolation. 

(1)  R  igorous  specification  of  dependability 
requirements 

This  is  not  a  sufficiently  mature  technology 
area  in  spite  of  the  fact  that  the  field  of  fault- 
tolerant  computing  is  at  least  30  years  old.  It  has 
been  mentioned  for  the  following  reasons  that  the 
RT-object  structure  offers  a  good  framework  in 
which  specifications  of  dependability 


requirements  can  be  incorporated.  Initially,  a 
dependability  requirements  specification  will 
appear  in  the  Conditions  sections  of  high-level 
Rl -objects.  The  specification  can  then  be 
decomposed  or  renned  as  high-level  RT-objects 
are  converted  into  networks  of  smaller  RT- 
objects.  Uniformity  of  the  representation 
structures  maintained  across  the  environment 
specification  and  the  CCN  specification  and  also 
maintained  during  stepwise  refinement,  is  a  major 
ingredient  enabling  the  rigorous  and  coherent 
specification  of  dependability  requirements  and 
the  design  of  d^ndable  computing  capabilities. 
Rigorous  specification  of  dependability 
requirements  in  turn  facilitates  cost-effective 
integration  of  various  dependability  design 
techniques  evolved  in  isolation. 

(2)  Integration  of  action-level  fault  tolerance 
techniques  and  DCS  diagnosis  and 
reconfiguration  techniques 

Since  these  techniques  require  certain 
allocation  of  hardware  resources,  a  model  that 
can  represent  both  functions  to  be  performed  and 
the  supporting  execution  resources  can  serve  as  a 
highly  valuable  guiding  structure.  The  RT-object 
model  is  a  highly  desirable  model  in  this  regard 
although  the  use  of  the  RT-object  model  for  this 
purpose  has  not  been  practiced  widely.  Active 
research  is  under  way  in  this  integration  area. 

(3)  Integration  of  fault  tolerance  design 
techniques  with  performance-guarantee 
oriented  design  techniques 

Performance-guarantee  oriented  design 
techniques  include  both  system  structuring 
techniques  and  resource  allocation  techniques 
aimed  for  guaranteed  response  time.  As 
mentioned  in  the  introduction,  such  rigorous 
handling  of  the  temporal  behavior  requires  a 
global  view  of  the  system  and  a  top-down 
engineering.  The  RT-object  based  system 
e^ineering  methodology  meets  this  requirement 
effectively.  Therefore,  for  this  integration  step 
again  the  RT-object  model  can  play  an  important 
role.  In  addition,  similar  arguments  can  be  made 
for  integrating  security  enforcement  techniques  in 
a  coherent  manner  into  the  RT-object  based 
system  engineering  methodology. 

6.  Conclusion 

This  paper  has  presented  a  proposition  that 
the  RT-object  model  is  not  just  an  attractive 
approach  for  complex  system  modeling  and 
structuring  and  it  is  actually  a  potential 
structuring  backbone  of  a  coherently  integrated 
methodology  for  engineering  DRS’s.  As  the  first 
step  on  our  part  toward  formulating  concrete 
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examples  illustrating  the  feasibiliw  of  such  a 
methMology,  a  specific  version  of  the  RT-object 
model,  denot^  by  RTO.k,  was  formulated 
several  years  ago.  and  an  experimental 
specification  and  design  of  a  simplified 
application  ^stem  which  is  based  on  the  RTO.k 
ooject  model,  is  under  way.  There  are  indications 
that  many  odier  research  organizations  share  our 
feeling  about  the  appealing  nature  of  object  based 
approaches  to  desi^  of  complex  real-time 
systems.  This  raises  hope  that  an  increasing 
number  of  demonstrations  of  the  potential 
benefits  of  using  RT-object  models  in  various 
parts  of  system  engineering  will  be  seen  in  open 
forum  in  the  future.  In  spite  of  the  promising 
nature  of  tihe  RT-object  based  approaches  to 
system  engineering,  however,  concrete 
demonstrations  in  this  area  are  not  expected  to  be 
simple  research  endeavors  in  terms  ot  efforts  and 
costs  required. 
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Abstract 

Typical  among  modern  applications,  such  as  AEGIS,  are  (1)  integration  of  multiple  ex¬ 
isting  large  systems,  as  well  as  development  of  new  systems  and  subsystems,  (2)  complex 
and  often  conflicting  objectives  (security,  robustness,  coherence,  real-time  and  physical  dis¬ 
tribution),  and  (3)  dynamically  adaptive  behavior.  Toward  the  goal  of  automated  software 
synthesis,  we  define  a  model  for  development  of  complex  system  software.  In  our  model, 
the  phases  of  complex  systems  engineering  are  ongoing  and  cyclic,  and  include  specification 
of  requirements;  reverse  engineering  of  existing  modules  and  data  structures;  reengineering 
of  existing  systems’  designs  and  implementations;  computation  of  module  to  table  binding 
strengths,  module  frequencies,  and  other  metrics  necessary  for  system  optimization  and  as¬ 
sessment;  clustering  of  modules  and  tables  baaed  on  bindings;  partitioning  of  clusters  among 
processing  elements;  assessment  of  configuration  for  conformity  to  requirements  and  opti¬ 
mality;  monitoring  and  incremental  adaptation.  We  address  these  concerns  in  an  integrated 
framework  based  on  a  system  description  language  called  RT-Chart. 


1  Introduction 

Protoplasts  of  the  complex  systems  of  today  are  found  among  considerably  simpler  Navy  and 
(non-Navy)  real-time  computer  control  systems  of  the  1940s.  Since  the  1940s  and  until  the  late 
1980s,  these  systems  have  evolved  as  what  has  been  understood  as  traditional  or  conventional 
real-time  systems.  A  traditional  real-time  system  is  relatively  small  (or  consists  of  a  rdatively 
small  number  of  logical  components)  and  static  in  nature.  The  structure  of  such  a  system 
is  typically  either  a  cyclic  executive  or  a  relatively  small  number  of  independent,  coarse-grain 
processes.  The  system  is  executed  on  a  small  number  of  processors.  In  addition  to  the  processors 
(and  their  associated  computer  resources,  such  as  memories  and  peripherals),  the  system  makes 
use  of  a  relatively  small  number  of  fairly  homogeneous  non-computer  resources  (such  as  sensors 
or  actuators).  Finally,  while  traditional  real-time  applications  have  needed  to  satisfy  fault- 
tolerance  and  other  non-functional  requirements,  the  mechanisms  for  incorporating  these  into 
the  corresponding  computer  systems  have  been  relatively  straightforward  (such  as  triple-modular 
redundancy). 

While  older  Navy  applications  have  necessitated  real-time  systems  of  the  traditional  kind, 
modem  Navy  systems  are  considerably  more  complex.  Typical  among  modern  applications  — 
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such  as  AEGIS  —  are  (1)  integration  of  multiple  existing  large  systems,  as  well  as  development  of 
new  systems  and  subsystems,  (2)  complex  and  often  conflicting  objectives  (security,  robustness, 
coherence,  real-time  and  physical  distribution),  and  (3)  dynamically  adaptive  behavior.  Conse¬ 
quently,  the  computer  systems  that  control  these  applications  are  required  to  act  accordingly. 
Specifically,  the  resulting  systems  often  are  very  large,  and  are  expected  to  adapt  in  a  timely, 
rapid  and  correct  fashion  to  frequently  chan^ng  environment  variables  and  conditions.  The 
systems  are  expected  to  run  on  modern  computer  architectures,  which  often  are  (highly)  paral¬ 
lel  and  utilize  many  heterogeneous  resources.  Another  aspect  of  such  systems  is  that  they  are 
expected  to  exist  for  decades,  due  in  part  to  the  tremendous  cost  of  their  development.  Finally, 
the  systems  need  to  incorporate  —  in  accordance  with  the  requirements  of  the  applications  they 
control  —  a  wide  variety  of  often  conflicting  functional  and  non-functional  objectives.  Given 
all  these  requirements,  it  is  natural  to  refer  to  these  systems  as  not  merely  real-time,  but  as 
complex. 

In  this  article  we  present  a  methodology  for  engineering  complex  computer  systems.  The 
methodology  views  a  complex  system  as  an  aggregate  of  software  components  (processes  and 
objects).  The  methodology  is  based  on  (1)  a  system-,  process-  and  object-oriented  language  for 
specification,  design  and  implementation  [11, 9],  (2)  an  umbrella  of  conceptual  system  views  (one 
for  real-time,  another  for  fault-tolerance  and  so  forth),  (3)  a  software  component  manager  [3,  5], 
and  (4)  a  set  of  integrated  techniques  for  synthesizing  complex  systems  from  existing  and  new 
software  systems,  for  execution  on  distributed  and  parallel  hardware  platforms  [9].  Synthesis 
considers  the  constraints  of  the  integrated  multiple  view  requirements,  while  searching  for  high 
quality  clusterings  and  assignments  of  modules  to  processors.  For  software  systems  (or  sub¬ 
systems)  being  implemented,  alternate  software  component  implementations  are  considered  to 
determine  the  most  suitable  for  the  application.  Additionally,  alternate  hardware  configurations 
are  considered. 

The  remainder  of  this  article  is  organized  as  follows.  In  Section  2  we  define  a  methodology  for 
complex  computer  systems  engineering.  Section  3  illustrates  the  methodology  with  the  AEGIS 
system.  Finally,  Section  4  summarizes  what  has  been  achieved  and  provides  directions  for  future 
work  in  this  promising  teclinoiogy. 


2  Complex  Computer  Systems  Engineering 

The  first  portion  of  this  section  provides  a  complex  systems  engineering  framework  that  resulted 
from  studying  the  development  process  employed  in  the  AEGIS  system.  The  second  portion  of 
the  Section  presents  a  complex  systems  engineering  model  that  captures  this  framework.  The 
section  concludes  with  the  presentation  of  a  system  description  language  that  allows  the  model 
to  be  put  into  operation  in  an  organized  fashion  in  any  complex  system  engineering  project. 


2.1  A  R-amework  for  Complex  Systems  Engineering 

A  complex  system  is  along-lived  and  evolving  system  of  systems,  having  functional  requirements 
and  non-functional  requirements.  Complex  systems  require  coherent  behavior  at  the  macro  level. 
In  fact,  we  assert  that  determinism  is  only  needed  at  the  complex  system  level,  in  contrast  to 
the  contemporary  real-time  lore,  which  dictates  that  all  components  of  a  computer  system  must 
be  deterministic  in  order  for  the  complete  system  to  be  deterministic.  Additionally,  there  must 
be  a  high  degree  of  distribution  of  the  systems  and  their  components,  yet  at  the  same  time  there 
must  be  a  coordination  of  the  systems’  activities  so  that  they  are  smoothly  connected.  The  coor¬ 
dination  task  is  further  complicated  by  the  fact  that  a  complex  system  must  exhibit  robustness 
in  the  presence  of  faults,  maintaining  reasonable  response  times  and  acceptable  functionality. 
Thus,  there  is  a  need  to  manage  redundant  subsystems. 
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We  also  believe  that  new  classifications  are  needed  for  real-time  systems,  so  that  the  needs 
of  complex  systems  can  be  expressed.  While  in  many  real-time  systems  there  are  processes  with 
genuine  hard  deadlines,  the  traditional  notion  of  hard  real-time  —  which  states  that  no  process 
can  ever  miss  a  deadline  —  is  an  idealistic  view  of  reality.  We  define  hard  real-time  systems 
to  be  those  in  which  (1)  the  physics  of  the  problem  domain  defines  the  timelines,  (2)  timing 
constraints  must  be  guaranteed  without  exception,  and  (3)  there  is  no  slack  for  timeliness. 
Hard  real-time  requirements  occur  in  sensor  and  actuator  systems.  Soft-real-time  systems  are 
those  in  which  (1)  there  is  coherent  behavior  at  the  macro  level,  (2)  timing  constraints  must  be 
guaranteed  without  exception,  and  (3)  timeliness  must  be  within  specified  bounds.  Soft  real-time 
requirements  occur  in  macro-level  (complex)  control  systems.  A  new  class  of  real-time  systems 
is  defined  to  be  easy,  in  which  (1)  there  is  reasonable  performance,  (2)  exceptions  on  timing 
guarantees  are  permitted,  and  (3)  there  are  no  timeliness  requirements.  Auxiliary  systems  are 
classified  as  easy  real-time. 

Complex  systems  engineers  also  require  the  ability  to  "throttle  down"  the  system,  by  shedding 
tasks  of  low  importance  during  times  of  overload.  It  is  frequently  the  case  that  the  exact  mix  of 
tasks  in  a  complex  system  cannot  be  determined  statically.  Thus,  some  system  resources  may 
become  saturated  and  a  dynamic  response  to  the  situation  is  warranted.  By  allowing  system 
developers  to  specify  the  policy  for  handling  overload,  their  task  is  simplified. 

Complex  systems  must  be  synthesized  from  many  autonomous  hardware  and  software  sys¬ 
tems,  since  they  bring  together  many  resources  that  must  be  joined  into  a  coherent  and  efficient 
whole.  Toward  the  goal  of  automated  software  synthesis,  we  define  a  model  for  development  of 
complex  system  software.  In  our  model,  the  phases  of  complex  systems  engineering  are  ongoing 
arid  cyclic,  and  include  the  following:  specification  of  requirements;  reverse  engineering  of  exist¬ 
ing  modules  and  data  structures;  reengineering  of  existing  systems’  designs  and  implementations; 
computation  of  module  to  table  binding  strengths,  module  execution  frequencies,  and  other  met¬ 
rics  necessary  for  system  optimization  and  assessment;  clustering  of  modules  and  tables  based 
on  bindings,  and  partitioning  of  clusters  among  processing  elements;  assessment  of  conformity 
to  requirements,  optimality,  load  balancing,  etc.;  monitoring  and  incremental  adaptation.  The 
remainder  of  this  Section  briefly  discusses  each  of  these  items. 


2.2  Requirements  Specification 

During  the  first  phase,  the  requirements  of  the  system  are  specified  [4].  This  is  accomplished 
by  stating  the  required  functionality  [4],  as  well  as  non-functional  requirements  such  as  timing 
behavior,  fault  tolerance  and  security.  During  this  phase,  necessary  system  features  are  stated  in 
implementation-independent  terms,  using  a  tedinique  such  as  system  design  factors  (SDF)  [1]. 
For  example,  requirements  may  state  that  the  data  from  a  particular  sensor  should  be  sampled 
once  per  second,  transformed  via  FFT,  and  matched  against  a  set  of  relevant  patterns  which 
cause  the  task  to  transmit  a  message  to  be  sent  to  a  handler  routine.  Additionally,  the  reliability 
of  the  task  could  be  stated  as  0.11  probability  of  failure,  and  the  output  of  the  task  could  have 
the  security  dassification  of  secret. 


2.3  Reverse  Engineering  and  Reengineering 

Bequirements  specification  is  followed  by  the  task  of  reverse  engineering,  which  captures  the 
designs  of  existing  systems.  The  goal  of  this  phase  is  to  identify  the  essential  features  of  the 
systems,  as  defined  by  the  reengineering,  design,  implementation,  and  optimization  phases. 
The  reverse  engineered  design  must  allow  reasoning  about  design  and  implementation  tradeoffs. 
Thus,  it  must  be  multiple  leveled,  supporting  a  design-time  view  and  an  implementation-time 
view,  with  perhaps  multiple  views  within  each  of  these  to  deal  with  specific  design  and  imple- 
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mentation  attributes  (such  as  timing,  cost,  dependability  or  parallelism).  Reverse  engineering  is 
a  very  difficult  task  to  perform,  since  the  system  to  be  reverse  engineered  may  be  implemented 
in  non-structured  programming  and  with  low-level  languages  (such  as  assembly  language).  Fur¬ 
thermore,  the  system  is  likely  to  be  implemented  in  multiple  languages.  The  use  of  language 
constructs  such  as  pointers  and  memory  overlays  further  complicates  design  capture.  Reverse 
engineering  is  a  multiple  pass  activity,  proceeding  from  analysis  of  low  level  details,  to  synthesis. 
Analysis  begins  with  the  examination  of  single  program  statements  and  data  elements.  Synthesis 
builds  on  the  analysis  results  by  examining  intercomponent  relationships  (such  as  a  statement 
accessing  a  data  element).  The  result  of  synthesis  is  an  aggregation  of  the  components  (into 
units  such  as  procedures).  Successive  passes  of  synthesis  result  in  larger  aggregate  components 
or  in  additional  intercomponent  relationships,  since  the  output  of  one  or  more  synthesis  phases 
serves  as  the  input  to  a  later  synthesis  phase. 

The  goal  of  reengineering  is  to  produce  a  system  design  and  corresponding  implementation 
that  satisfy  the  requirements.  It  is  performed  with  the  goals  of  (1)  correctness  (such  as  deadlock 
and  race  condition  avoidance  and  synchronization  of  access  to  data  shared  among  processes  gen¬ 
erated  by  the  reengineering)  (2)  efficiency  (via  techniques  such  as  parallelism  and  code  cloning), 
(3)  analyzability  and  testability  (for  all  system  requirements  —  functionality,  timing,  etc.  — 
even  in  the  presence  of  aliasing  and  unbounded  loops),  (4)  portability  (thus  avoid  platform 
specifics),  and  (5)  maintainability  and  adaptability. 


2.4  Configuration,  Optimization,  Assessment,  and  Metrics  Collection 

Ther '  are  many  degrees  of  freedom  for  which  choices  must  be  made  in  the  design,  implementation 
and  configuration  of  a  complex  system.  The  configuration  refers  to  the  choices  of  processors 
and  memories,  their  interconnection,  and  the  distribution  of  software  components  and  data 
among  the  processors  and  memories.  The  choices  made  affect  the  quality  of  the  resultant 
system,  in  terms  of  meeting  constraints  and  also  in  terms  of  the  closeness  to  optimality.  Due 
to  the  complexity,  selection  of  choices  cannot  be  performed  manually.  Note  also  that  there 
are  constraints  of  varying  classes.  User-defined  constraints  (such  as  deadlines  and  periods  for 
processes)  are  fixed  and  cannot  be  altered  by  the  optimizer.  Configuration-defined  constraints 
are  flexible,  and  include  items  such  as  processor  and  Unk  speeds,  interconnection  topology,  and 
software  component  versions.  Configuration  and  optimization  may  consider  the  following  degrees 
of  freedom:  clustering  of  modules  and  data  to  be  assigned  to  a  processor  as  a  unit;  software 
component  to  processor  assignment  (10, 12];  hardware  configuration;  load  balancing;  parallelism; 
communication;  and  software  component  version  selection  [3,  5).  A  multipass  configuration  and 
optimization  strategy  is  appropriate  for  complex  systems  due  to  the  magnitude  of  the  problem. 
For  example,  in  one  pass  a  clustering  of  modules  based  on  one  criterion  such  as  communication 
binding  can  be  performed  using  a  fast,  coarse  greedy  algorithm.  In  a  second  pass,  the  clusters 
can  be  assigned  to  processors  using  an  accurate,  detailed,  high  quality  optimization  technique 
such  as  simulated  annealing  or  neural  networks.  The  first  pass  has  the  effect  of  reducing  the 
complexity  of  the  assignment  problem,  since  the  number  of  units  to  be  assigned  has  been  reduced 
significantly. 

To  determine  the  acceptability  of  a  particular  system  configuration,  and  to  guide  the  op¬ 
timization  phase,  the  system’s  characteristics  are  assessed  with  respect  to  the  system  require¬ 
ments  [2,  6,  7,  8,  9,  11).  Conformity  to  timing,  dependabiUty,  security  and  other  requirements 
is  checked.  A  detailed  set  of  techniques  for  checking  timing  conformity  of  complex  systems 
is  presented  in  [9,  11].  The  paper  presents  techniques  for  predicting  response  times  of  inde¬ 
pendent  real-time  processes  that  are  distributed  over  many  processing  elements  (PEs).  The 
techniques  estimate  contention  for  PEs  (CPUs)  and  network  communication  links.  The  con¬ 
tention  is  combined  with  utilizations  (based  on  frequency  of  execution)  to  compute  a  rate  of 
progress  experienced  by  clients  of  each  device.  Response  times  are  computed  as  the  product 
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of  rate  of  progress  and  computational  demand.  The  predictions  have  recently  been  enhanced 
to  apply  to  systems  in  which  asynchronous  remote  procedure  calls  are  allowed,  thus  permitting 
parallelism  within  a  task  to  be  modeled  accurately. 

To  evaluate  the  quality  of  a  particular  set  of  system  designs  and  implementations,  under 
a  chosen  configuration,  the  values  of  various  system  attributes  need  to  be  collected  [9,  10, 
11].  The  collection  of  the  metrics  is  (in  part)  a  static  process,  wherein  compiler- based  tools 
extract  information  such  as  the  set  of  global  tables  read  or  written  by  eath  procedure,  or  the 
interprocedure  call  graph.  Additional  information  is  coUected  by  monitoring  the  behavior  of  the 
system  during  execution.  The  dynamic  monitoring  can  provide  such  valuable  information  as  the 
amount  of  data  read  from  a  particular  global  table  by  a  given  procedure,  or  the  probability  of 
talcing  a  particular  branch  of  a  conditional. 

2.5  System  Description  Language 

To  meet  the  challenge  of  complex  systems  engineering,  we  propose  an  integrated  multi-view 
methodology.  The  notion  of  multiple  views  is  not  new.  In  particular,  five  essential  conceptual 
views  are  presented  in  [1].  However,  our  views  are  operational  (they  correspond  to  operational 
requirements)  rather  than  conceptual,  and  include  (but  are  not  limited  to)  the  foUowing  views. 

1.  The  Functional  view  presents  a  system  in  terms  of  active  processes  and  their  dependencies 
(through  usage  of  passive  res  urces  and  direct  interactions). 

2.  The  Timing  view  presents  time-constraints  of  each  process,  and  each  resource  or  interaction 
requirement  (of  an  action).  For  instance,  a  particular  process  may  be  strictly  periodic  with 
a  deadline  at  the  end  of  each  period  and  may  require  a  particular  resource  for  a  certain 
amount  of  time  in  every  period. 

3.  The  Fault- Tolerance  view  presents  the  fault-tolerance  and  reliability  requirements  of  each 
process  and  each  resource  usage  or  interaction.  For  instance,  a  particular  process  may 
need  to  be  replicated  and  run  on  two  physically  separated  CPUs. 

4.  The  Security  view  presents  security  requirements  of  the  system.  For  instance,  it  may 
require  a  particular  degree  of  clearance  to  access  a  particular  resource. 

As  the  development  of  a  system  matures  —  throughout  specification,  design,  implementation, 
maintenance,  and  even  during  execution  of  dynamically  adapting  systems  —  the  treatment  of 
each  operational  requirement  will  naturally  include  increasingly  more  complexity.  To  accom¬ 
modate  such  complexity,  the  corresponding  view  will  support  a  hierarchical  definition  of  the 
operational  view  of  the  requirement.  To  express  the  view,  a  rigorous  description  language  called 
RT- Chart  is  provided. 

Each  of  the  operational  views  can  be  mapped  onto  and  from  each  conceptucd  view  (though 
not  every  map  may  necessarily  be  equailly  useful).  In  judging  an  overall  design,  it  is  in  fact  be 
useful  to  map  different  operational  views  onto  and  from  a  corresponding  conceptual  view.  We 
will  provide  “useful”  maps  from  operational  views  onto  rnd  from  conceptual  views.  Note  that 
RT'Chart  can  serves  as  an  intermediate  form  for  which  analysis  and  optimization  techniques 
exist  [11, 9],  however,  the  actual  language  used  to  express  the  system  properties  may  be  RT-Chart 
but  need  not  be  RT-Chart.  One  can  define  maps  from  other  description  languages  to  RT-Chart, 
and  the  mapping  can  be  automated.  In  fact,  we  have  defined  maps  from  Ada-package-based 
and  object-oriented  systems  to  RT-Chart. 

Different  requirements  of  a  system  may  be  addressed  by  different  teams  of  engineers.  Each 
team  should  thus  typically  develop  and  appreciate  its  own,  specific  operational  system  view. 
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Figure  1:  AEGIS  Combat  System. 


To  integrate  and  maintain  consistency  among  the  different  operational  views,  another  set  of 
maps  —  this  time  from  one  operational  view  and  its  corresponding  description  language  into 
another — are  provided,  along  with  consistency  checking  rules.  The  actual  implementation  of 
these  maps  uses  the  maps  between  operational  and  conceptual  views.  Furthermore,  to  support 
the  hierarchy  within  each  opeiational  view,  the  hierarchical  maps  are  used  along  with  additional 
mapping  rules  to  ensure  that  maps  across  methodological  views  involve  “compatible”  levels  of 
the  hierarchy. 

RT-Chart  specifications  represent  a  system  as  a  set  of  periodic  and  event-drive  processes, 
each  composed  of  a  set  of  actions.  The  first  action  of  a  process  is  performed  at  the  start 
of  each  process  activation,  as  determined  by  the  beginning  of  the  process’s  period  or  by  the 
occurrence  of  the  event  driving  the  process.  Upon  completion,  the  initial  action  invokes  a 
successor  action,  passing  some  data.  RT-Chart  provides  a  resource  algebra  for  stating  the  modes 
in  which  resources  can  be  used:  (1)  may  be  used  concurrently  (2)  must  be  used  concurrently  (3) 
cannot  be  used  concurrently  and  (4)  cannot  be  used  concurrently  and  must  be  used  in  a  particular 
order.  Additionally,  and-gates  allow  the  specification  of  parallel  actions,  and  or-gates  indicate 
conditional  execution  of  one  among  a  set  of  actions.  Furthermore,  in  RT-Chart.  the  actions  may 
be  hierarchical,  to  enable  macro-level  reasoning  and  specification.  We  have  found  it  useful  to 
describe  detailed  implementations  of  RT-Chart  actiens  as  objects  and  packages.  In  addition  to 
functionality,  timing  and  parallelism,  there  are  other  important  system  tispects  that  RT-Chart 
allows  to  be  expressed.  Security  classification  levels  can  be  indicated  for  information  flows,  code 
(actions),  resources,  levels  of  hierarchy,  and  implementation  details.  To  allow  dependability 
to  be  dealt  with,  degree  of  redundancy  or  reliability  can  be  specified  for  actions  or  processes. 
Another  aspect  of  systems  is  relative  criticality,  which  can  also  be  expressed  for  actions  and 
processes. 


3  A  Military  Example 

The  United  States  Navy  has  successfully  developed  and  deployed  complex  systems.  An  exam¬ 
ple  is  the  .4EGIS  combat  system  (see  Figure  3).  AEGIS  is  a  complex  system,  composed  of 
many  systems.  The  main  categories  of  systems  composing  AEGIS  are  (1)  detect  (2)  control, 
and  (3)  engage.  The  detect  category  consists  of  the  LAMPS,  surface  radar,  identification,  elec¬ 
tronic  sensing,  navigation,  and  sonar  systems.  The  control  category  of  systems  includes  C&D, 
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Figure  2:  Auto-special  control. 


WCS,  SPY  radar,  ACTS,  and  ORTS.  Li  the  engage  category  of  systems,  there  are  the  LAMPS, 
electronic  warfare,  GUN  weapon,  fire  control,  vertical  launching,  advanced  tomahawk  weapon 
control,  phalanx  weapon,  and  underwater  fire  control. 

To  illustrate  the  methodology  of  the  previous  section,  consider  the  anti-air  warfare  (AAVV) 
engagement  control  system  of  AEGIS.  It  provides  the  capability  of  automatic  identification  and 
engagement  of  quickly  evolving  threats.  Obviously,  these  tasks  must  be  performed  according  to 
rigid  timing  requirements,  with  little  variance.  An  early  indication  of  of  possible  quick  reaction 
targets  is  given  by  the  radar  system.  Detection  of  such  targets  triggers  a  high  priority  set 
of  actions,  consisting  of  missile  preparation,  further  classification  of  threat,  missile  firing  and 
guidance  (or  abortion  of  firing  sequence  if  the  target  is  determined  to  be  nonhostile). 

This  set  of  actions  can  be  modeled  in  an  RT-Chart  specification  as  shown  in  Figure  3.  The 
process  is  activated  by  a  radar  signal  (S)  arriving  at  the  detect  action.  To  detect,  the  radar 
unit,  signature  DB,  and  trackfile  are  used  sequentially  (as  indicated  by  the  “o”  resource  usage 
operator).  Upon  detection,  a  triple  (S,  T,  M)  is  sent  to  the  &-gate  which  forks  control  to 
concurrent  actions.  The  triple  contains  information  describing  the  radar  signal  (S),  the  track 
Identifier  (T)  and  the  missile  launcher  and  missile  type  (M)  to  be  used.  The  prepare  missile  action 
can  use  the  trackfile  and  launcher  concurrently.  The  verify  threat  action,  running  concurrently 
to  the  prepare  missile  action,  uses  the  radar  unit  and  the  trackfile  concurrently  and  then  uses 
the  signature  DB.  Threat  verification  sends  the  value  (V) — a  confirmation  or  a  negation  of  the 
preliminary  detection — to  the  launch  and  guidance  action.  The  launch  and  guidance  action  uses 
the  launcher  and  trackfile  conctirreiuly,  and  then  uses  the  missile  and  trackfile  concurrently. 

Following  the  firing  of  a  missile,  uplink  commands  are  periodically  sent  to  the  missile  to 
control  its  intercept  trajectory.  The  appropriate  trajectory  is  determined  by  considering  the 
speeds  and  positions  of  the  target  and  the  missile,  and  extrapolating  the  position  of  the  target. 
The  missile  guidance  action  breaks  down  further,  into  a  set  of  actions:  measure  target  and 
missile  positions,  calculate  missile  correction  factor,  build  uplink  command  to  be  sent  to  missile, 
transmit  uplink  command  to  missile.  The  missile  guidance  must  be  performed  periodicallv 
to  allow  the  missile  to  intercept  the  desired  target.  The  deadline  for  the  uplink  command  to 
be  received  by  the  missile  is  stringent,  since  it  is  programmed  to  self-destruct  if  it  receives 
no  sucli  command  (to  avoid  destroying  the  wrong  object).  Missing  of  such  a  deadline  is  highly 
undesirable,  since  it  results  in  the  waste  of  a  missile  and  also  places  the  combatant  at  considerable 
risk.  Parallel  processing  is  appropriate  to  meet  such  timing  requirements,  since  there  may  be 
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Figure  :l:  Missile  control  process. 

multiple  missiles  in  flight  and  since  the  spee<l  of  missiles  is  ever  improving  and  therefore  timing 
requirements  become  more  stringent. 

This  launcli  and  guidance  actions  can  be  modeled  in  an  RT-Chart  specification  as  shown  in 
Figure  3.  The  task  performed  is  either  (as  indicated  by  the  !-gate)  an  abortion  of  launch  if  the 
hostile  target  was  not  verified,  or  is  a  launch  and  successive  guidance  of  the  missile. 


4  A  Last  Word 

The  engineering  of  comple.x  systems  requires  the  integrated  solutions  of  problems  related  to  par¬ 
allel  and  distributed  processing,  real-time,  security,  and  dependability.  The  framework,  model, 
and  methodology  embodied  in  this  i)aper  addresses  these  concerns  by  building  on  the  experiences 
of  the  AEGIS  development  team  and  the  members  of  the  Real-Time  Computing  Laboratory  at 
NJIT.  The  result  is  a  framework  and  corresponding  ap])roach  that  allow  complex  systems  to  be 
specified,  designed,  configured,  evaluated  and  maintained. 

In  our  work  so  far,  we  have  addressed  the  Functional  and  Timing  operational  views  and  have 
explored  their  relationship  with  the  Implementation  conceptual  view.  We  have  also  defined 
a  specification  language  RT-Spec  and  a  design  and  im])lementation  language  RT-Chart  for  ex¬ 
pressing  operational  and  conceptual  views.  So  far.  we  have  not  incorporated  a  hierarchy  into  the 
Functional  and  Timing  views  nor  into  the  RT-Spec/RT-Chart  semantics,  though  work  on  this 
is  in  progress.  Additionally,  the  reengineering  and  metrics  collection  areas  are  in  their  infancy, 
requiring  much  additional  research.  .Another  problem  we  are  continuing  to  address  is  the  inte¬ 
gration  of  the  solution  to  the  optimization  problem  into  the  complex  systems  model  described 
in  this  document.  We  are  also  exploring  terlini(|ues  for  managing  libraries  of  reusable  software 
components.  Tools  incorporating  the  techniques  are  also  being  evolved  with  the  research. 

We  would  like  to  thank  all  ntembers  of  the  SSSWG.  N.11T  RTCL  members,  and  the  HiPerD 
team  for  the  influences  they  have  had  on  the  ideas  embodied  in  this  document. 
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An  Assessment  Control  Board  (ACB) 

and  a 

System  Integration  (SI)  Program 
as  complements  to 

The  Configuratkm  Control  Board  (CCB) 

Overview 

Systems  development,  eq)ecially  in  DoD  programs,  typically  includes  a  basic  program  control 
board  known  as  the  Configuration  Control  Board  (CCB).  CCB  membership  is  often  a 
combination  of  both  contractor  and  user/go  vemment  personnel,  or  else  each  organization  diairs 
their  own  CCB,  with  participation/attendance  open  to  the  other.  The  roles  of  the  CCB  can  vary 
depending  on  whether  it  is  developer  or  user/govenunent  managed;  but  in  both  organizations  the 
CCB  has  the  common  function  of  control  of  program  configuration  change,  particularly  for  the 
baseline  documents  that  establish  the  requirements,  design,  and  testing.  A  developer  CCB 
controls  the  change  nominatioiL  A  user/govemment  CCB  controls  the  changes  and  their 
associated  funding  and  schedule  adjustments,  if  any.  CCBs  act  aflet*the-£Bct  in  the  sense  that 
they  receive  formal  change  proposals  in  specific  formats,  some  of  which  may  have  been  in  prior 
preparation  for  months. 

A  key  to  achieving  assessments  is  a  management  complement  to  the  traditional  Configuration 
Control  Board  (CCB)  that  takes  the  form  of  an  Asses^mit  Control  Board  (ACB).  An  ACB  is  a 
complementary  and  contrasting  program  control  board  that  has  been  applied  on  several  major 
system  [software  and  hardware]  development  {xojects.  A  summary  of  the  complementary  nature 
o^  and  contrast  between,  an  ACB  and  the  traditional  CCB  might  be  equated  to  the  quote  on 
talent  [CCB]  and  tact  [ACB]  in  an  anonymous  quote  published  in  McGuffey’s  Sixth  Eclectic 
Reader,  Van  Nostrand,  circa  1878~pp  113: 


Talent  and  Tact 

"Talent  is  power,  tact  is  skiU.  Talent  has  weight,  tact  is  momentum.  Talent  knows 
to  do,  tact  knows  haw  to  do  it  Talent  is  wealth,  tact  is  ready  money.  Talent  sees  its  way 
clearly,  but  tact  is  first  at  its*  journeys  end.  Talent  convinces,  tact  converts.  Take  diem 
to  court  and  talent  feels  its  weight,  tact  finds  its  way.  Talent  commands,  tact  is  obeyed. 
Unless  they  are  combined  we  have  successful  pieces  whidi  are  not  respectable,  and 
respectable  pieces  which  are  not  successfitL  " 
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The  thesis  of  this  paper  is  that  there  is  a  need  for  both  talent  and  tact:  both  assessment  OMitrol 
[ACB]  and  configuration  control  [ACB].  Assessment  makes  discoveries,  the  CX3  disciplines 
the  application  of  those  discoveries.  Assessmmt  anticipates  and  plans,  the  CCB  opmtes  after- 
the-fact  and  regulates.  While  CCBs  are  as  essential  as  talent,  they  are  as  equally  in  need  of  the 
balance  of  tact  [assessment].  CXTBs  alone  can  be  deficient  in  three  ways: 

1.  The  CCB  members  involve  thenaselves  more  in  the  detailed  change  proposed  rather 
than  in  the  control  of  the  process; 

2.  The  CCB's  scope  is  typically  inap|xopriately  constrained  to  program  change,  and  that 
is  all  too  often,  when  presented,  essenti^ly  defocto— as  the  majority  of  the  investigation 
resources  [both  time  and  money]  have  by  then  been  used;  and 

3.  The  CCB  does  not  meet  early  enough  to  recognize  and  address  risks  as  they  develop 
and  are,  at  that  point  in  time,  still  controllable  [time  is  the  development's  one  inelastic 
resource]. 

An  ACB,  on  the  other  hand,  would  meet  weekly,  and  only  briefly,  and  focus  on  the  control  of 
the  assessments,  in  the  form  of  one-page  Assessment  Plans  (APs)  and  the  results  that  are  also 
one-page  Assessment  Reports  (ARs).  The  ACB  focus  on  assesanents  would  contrast  with  and 
complement  the  CCB  approval  of  proposed  changes.  As  illustrated  in  Figure  1,  the  ACB 
controls  the  work  before  it  ever  starts;  the  CCB  controls  the  implementation  of  change,  where 
the  design  of  the  change  has  already  occurred. 


Figure  1  ACB  and  CCB  Operations 

Assessment  Plans  (APs) 

Assessment  Plans  (APs)  are  one  page,  with  the  following  attributes: 
1.  Scope 

The  worir  and  the  associated  products  to  be  assessed 


2.  Assessment  Cdteria 

The  criteria  to  be  applied  in  assessing  the  work  and  the  products.  This  is  one  of 
the  hardest  elonent  of  a  plan  to  devise,  and  accordingly  one  of  the  most  critical 
program  controls.  . 

3.  Approach 

How  will  the  assessmoit  itself  be  assessed. 

How  will  the  assessment  be  oooducted~the  fbrmat  and  process 

Who  will  be  on  the  "separate^dq)end»nt"  assessment  team-their  names 

4.  Schedule  and  cost 

The  assessment  milestones  and  the  proposed  investment  in  assessment 
The  ACB,  by  approval  of  the  one-page  APs,  exercises  the  essential  "tact”  influence  of  a  project, 
as  a  complement  to  the  GCB  "taloit".  By  directing  the  plans  for  assessment,  the  ACB  directs  the 
future,  in  addition  to  controlling  it 

System  Integration  (SI)  Program 

A  parallel  development  concept  that  can  be  applied  to  strengthen  the  CCB  is  for  the  ACB  to  also 
sponsor  an  SI  program  for  CCB  control.  The  objective  of  an  SI  Program  is  to  assure  that 
proposed  changes  are  well  prepared  for  CCB  consideration.  Changes  may  be  changes  to  the 
configuration  of  the  program  architecture,  and  schedule  as  well  as  a  change  to  the  design.  There 
are  three  primary  dimensions  of  an  SI  program,  collectively  they  are  known  as  the  three  Is,  they 
are  illustrated  in  Figure  2 

1.  Identification 

2.  Investigation 

3.  Implementation 
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The  driving  influence  of  the  SI  Program  is  in  the  first  two  "Is”:  Identiflcation  and  Investigation. 
These  are  supported  by  a  four>part  SI  Program  structure  that  is  illustrated  in  Figure  3: 

1.  Use  of  System  Reports  (SRs)  as  indiviihially  numbered  records  of  every  problem, 
suggestion,  insight,  or  idea.  An  SI  Database  is  built  on  the  ever-accumulating  set  of  SRs. 
SRs  are  recorded  as  symptmns,  so  to  speak,  without  prejudice.  They  are  not  filtered  by 
any  criteria,  such  as  who  said,  or  how  they  were  reported,  or  whether  they  were 
validated.  They  are  accumulated  and  honored  by  a  unique  SR  Number  that  is  never 
reused.  Thus,  while  the  SR  may  be  placed  in  an  inactive  file,  its  identity,  its  number, 
always  remains  unique  to  that  SR. 

2.  The  use  of  non-representation  PrrAlem  Area  (PA)  teams  to  assess  the  overall  program 
handling  of  the  SRs.  The  team  members  are  drawn  flom  both  the  user/govemment  and 
the  developer  and  serve  as  professional  collateral  assignments  and  not  as  representatives 
of  their  parent  organization's  managemrat  priorities  or  interests.  The  PA  Teams  assess, 
they  do  not  have  responsibility  for  solutions.  They  recommend  initiatives,  but  they  do 
not  sponsor  changes  to  the  CCB-with  the  attendant  responsibility  to  implement  approved 
changes.  The  PAs  monitor  the  process,  both  its  design  and  operation 

3.  Candidate  Program  Initiatives  (CPIs).  CPIs  are  temporary  homes  for  potential 
program  initiatives.  OPIs  are  unfunded  and  without  a  designated  management 
responsibility.  They  are  the  initial  planning  framework,  neural  territory,  for  the 
allocation  of  SRs.  Noting  that  SRs  are  allocated  redundantly,  with  one  primary 
allocation  and  multiple  secondary  assignments. 

4.  Program  Objectives  (POs).  POs  are  funded,  have  assigned  implementation 
responsibility,  and  are  the  formal  vehicles  for  configuration  change.  POs  are  assembled 
as  the  implementation  packages  from  the  array  of  CPIs.  They  may  be  one  entire  CPI  or 
include  pcutions  of  many. 


Syitcm 
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The  Investigation  Process  has  a  central  feature  of  the  use  of  Round  Tables  (RTs)  to  strengthen 
the  investigation.  The  Investigation  Plan  (IP)  is  considered  by  a  panel  of  three  to  five  experts. 
The  RT  is  the  assessment  team  for  the  planned  investigation.  The  IP  includes  [as  item  4]  an 
Assessment  Plan  [approved  by  the  ACB]  as  a  key  part  of  the  Investigation  process.  The  IP 
defines,  for  the  RT  consideration,  the  following  top-level  attributes  of  the  planned  investigation: 

1.  Problem/Opportunity  A  summary  of  the  problem  or  of^rtunity  to  be  addressed. 

2.  Apfxnach  The  key  features  of  the  ^jproach  to  the  investigation  effort 

3.  Resources  Staff-Months,  Schedule  for  the  activity,  and  an  estimate  of  the  "ranges"  of 
the  Implementation/Life  Cycle  Costs  or  Implementation  Resources  Requirements  Range 
(IRS)  that  span  the  high-low  for  three  parameters:  Duration  [how  long  to  build],  CY 
[calendar  year  for  start]  and  costs. 

4.  Assessment  The  assessment  plan,  identifying  the  proposed  assessment  team  members, 
the  plans  and  products  to  be  assessed  [assessment  milestones]  and  the  (xincipal  criteria  to 
be  applied  in  ttkose  assessments. 

5  References.  Available  background  material— what  is  already  known  about  the  problem 
and  alternative  solutions  and  approadies. 

6.  Products  and  Schedule  The  principal  delivosbles-the  tasks  to  be  accomplished-with 
their  milestone  schedules. 


The  CCB  controls  the  transititm  from  Investigation  to  Implementation.  By  operating  an  SI 
Program,  the  ACB  supports  the  CCB  with  proposed  changes  that  have  been  appropriately 
omsidered,  with  up-front  review  by  an  RT,  that  can  include  a  public  "hearing"  fDrmat  wh«’e  all 
interested  parties  can  be  apprised  of  the  planned  investigations. 
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Sumnuiiy 

Assessment  plans  and  reports  are  necessary  complements  to  program  ctmtrol  through  CX3s. 
CCBs  are  easoitially  totally  after-the-fact  The  resources  [both  time  and  money]  to  prepare  the 
proposed  changes  has  already  been  invested  by  the  time  the  CCB  receives  them  for 
consideration.  The  operation  of  an  ACB  is  management  working  up  front,  \^ere  the  leverage  is 
greatest  The  ACB  works  with  one-page  Assessment  Plan  (AP)  summaries  for  the  plaimed 
assessment  of  all  work.  The  scc^  of  some  APs  may  be  very  large,  others  very  snull,  as  the 
ACB  approves.  Some  work  may  be  small  in  dollar  scope  but  large  in  significance.  The  ACB 
influence  of  how  work  is  to  be  assessed  is  a  prime  lever  on  what  is  to  be  done.  The  criteria  for 
"goodness"  and  the  names  of  those  who  will  prq>are  the  assessment  repcHt  are  key  management 
influencers.  The  Assessment  Report  (AR)  is  generally  succinct,  not  belabored.  Does  the  woric  / 
fvoduct  meet  the  pre-established  criteria?  The  plans  for  the  AR  is  itself  part  of  the  AP  that  the 
ACB  controls. 

Operation  of  a  planning  process,  called  an  SI  Program,  based  on  SRs  as  the  primitives  for  all 
platming,  is  a  potential  added  aid  to  the  CCB.  With  an  SI  Program,  the  CCB  receives  proposed 
changes  that  have  been  identified  and  investigated  with  care-by  pre-planned  assessments. 
Further,  an  SI  Program  works  with  bipartisan  Problem  Area  (PA)  Teams  [that  include  both 
customer  and  developer  members-  not  as  refvesentatives  of  their  organizations,  but  for  their 
expertise  only]  to  monitor  the  implementation  and  the  planning  from  the  perspective  of  their 
"problem  area".  They  particularly  concern  themselves,  working  on  a  very  low  duty  cycle-an 
hour  a  week  or  less,  with  the  planning  to  address  their  assigned  SRs.  What  is  being  done  with 
the  symptoms  [  a  synonym  for  system  report  or  SR]  they  have  monitoring  cognizance  for. 

Investigations  are  as  critical  as  the  assessment  of  assigned  work.  How  investigations  are  to  be 
accomplished  is  a  key  concern  of  the  ACB.  A  central  feature  of  the  Investigation  Plan  (IP)  is  the 
Assessment  Plan  (AP)  for  that  investigation.  A  built-in  feature  of  every  Investigation  AP  is  the 
use  of  a  Round  Table  to  provide  up-front  assessment  of  the  planned  investigation-an  assessment 
of  the  IP  itself.  The  one-hour  or  less  [or  not  as  a  face-to-foce  meeting  if  preferred]  meeting  of 
the  RT  members  assures  that  the  planned  investigation  is  well-considered.  While  the  primary 
need  for  an  ACB  type  focus  is  on  the  plans  for  Investigations,  nominated  by  one-page 
Investigation  Plans  (IPs),  an  ACB-oontrolIed  operation  of  assessment  teams  could  also  be 
effectively  applied  across  the  foil  span  of  the  software  lifecycle.  The  idea  is  that,  at  least  in 
addition  to,  [and  possibly  in  some  cases  even  in  lieu  of]  the  after-the-fact  milestones  of  System 
Design  Reviews  (SDRs)  and  Preliminary  Design  Reviews  (PDRs)  there  could  be  a  concentrated 
focus  where  the  leverage  is  greatest:  on  the  plans  for  the  conduct  of  each  phase. 
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Abstract:  The  emergence  of  the  object  philosophy  in 
the  new  software  development  techniques  gave  birth  to 
many  objea  models.  The  object-oriented  approach 
enables  the  improvement  of  software  quality,  the 
reduction  of  future  maintenance  requirements,  the  reuse 
and  the  adaptation  of  specification  and  developments. 
However  the  difficulty  lies  in  the  transition  between  the 
conceptual  specification  and  the  implementation  because 
of  the  disparity  of  the  formalism  proper  to  each  level.  To 
resolve  the  problem,  we  propose  an  object  oriented 
interface  supported  by  a  software  tool  and  based  on  a 
pivot  model  and  a  set  of  mapping  rules.  This  research  has 
been  developed  within  the  framework  of  the  ESPRIT  II 
project  named  Business  Oass^ 

Keywords:  object-orienied  analysis,  ohjccl-uricnted 
design,  software  engineering  environment  inronnaikm 
systems  development 

1-  INTRODUCTION: 

The  Object-Oriented  approach  emerges  in  certain 
number  of  data  processing  domains,  such  as 
programming,  software  engineering,  data  base, 
DBMS,  analysis  and  design  of  data  base  and 
information  system.  The  paradigms  underlying  the 
computational  object-oriented  are  stabilized 
enough  to  consider  that  they  are  providing  a 
unifying  approach  for  information  system 
development. 

Object-oriented  requirements,  analysis  and 
specification  models  are  henceforth  the  subject  of 
an  active  research  effort.  Indeed,  object-oriented 
design  techniques  (e.g.  (Booch87],  [Heitz89], 
IMeyer90],  tWirfs-Brocks90])  do  not  resolve  all  the 
problems  of  object-oriented  development  in  the 


^  Ibis  project  is  supported  by  the  European  Coininission 
under  the  contract  531 1  of  the  second  European  Strategic 
Progrsun  for  Research  and  Development  in  Infonnaiion 
Technology  (ESPRIT).  The  main  partners  include 
T^Msystimes  (France).  Sociiite  des  Outils  du  Logiciel 
(France),  Aj^lied  Logic  (United  Kingdom).  Elite]  (Spain)  and 
Datamont  (Italy).  Universite  dc  Paris  1  involves  in  tlie 
development  of  the  analysi.s  environment  and  the 
integration  of  the  tool  in  the  PCTE-hased  software 
engineering  environment. 


information  system  domain.  Object-oriented  design 
deals  with  the  solution  space,  the  "how*  of 
technical  design,  and  generally  emphasize  ttie 
organization  and  reuse  of  code.  Object-oriented 
analysis  is  concerned  in  the  problem  space,  the 
"what*s"  of  requirements'  specification,  and 
centers  on  the  semantics  of  the  phenomerui, 
postponing  technology-dependent  dwices. 
Object-oriented  design  methodologies  are  focusing 
on  system  design  as  a  later  stage  of  the  application 
life  cycle,  implying  that  the  earliest  stage  leading 
to  requirement's  specification  and  conceptual 
design,  have  been  performed. 

Object-oriented  analysis  methodologies  are  still 
under  investigation.  Three  main  approaches  are 
being  proposed: 

-  the  functional  approach  uses  traditional  DFD 
based  techniques  to  derive  object  specification 

•  the  data  driven  approaches  are  influenced  by 
E/R  modelling  to  define  objects 

•  the  object  based  approaches  recommend  the 
use  of  the  object  concept  right  from  the  beginning  of 
the  system  life  cycle.  The  concept  of  object  is  then 
the  basic  element  the  system  relies  on. 

The  claim  of  these  approaches  is  that 
enhancements  and  extensions  of  the  computational 
object  concept  are  required  to  make  it  relevant  to 
conceptual  modelling. 

O*  lBrunet91],  MCO  [Castellani93J,  (ODD,  GOOD) 
lBooch86,87]  HOOD[Heite89],  OMT  [Rambaugh  & 
al.91],  OOAjCoad  &  Yourdan91],  and  (X)SA 
(Shlaer&Mellor91]  are  examples  of  approaches  to 
support  conceptual  modelling  in  an  object-oriented 
way. 

The  paper  aims  at  presenting  an  oriented 
conceptual  modelling  and  object-oriented 
implementation.  To  do  so,  we  propose  an  interface 
supported  by  a  software  tool  and  based  on  a  generic 
object-oriented  conceptual  pivot  model  and  a  set  of 
mapping  rules. 

The  problem  of  the  design  and  the  implementation 
of  information  systems,  was  an  impetus  for  the 
development  of  framework  for  object  pivot  model, 
based  on  a  general  metamodel  and  Object  Oriented 
Conceptual  Pivot  Model  (0(XIPM).  The  following 
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section  presents  the  object  oriented  conceptual 
pivot  model.  Section  3  illustrates  the  framework 
and  a  set  of  mapping  rules  by  means  of  examples. 
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2-  The  Object-Oriented  Conceptual  Pivot 
Model: 

Modelisation  of  the  real  world  consists  of 
regrouping  objects  in  classes  of  same  type  and 
describing  relationships  that  can  exist  between 
them. 

Developing  an  information  system,  real  world 
classes  (or  entities)  are  modeled  by  corresponding 
objects  in  information  system.  For  describing  the 
data  and  behavior  of  an  information  system  it 
should  be  decided  which  classes  are  relevant,  and 
which  behavior  is  to  be  expected  from  those 
classes.  The  behavior  of  information  system  is  a 
reaction  of  one  or  more  individual  objects  to 
external  events  detected  by  the  system.  The 
behavior  of  object  depends  on  the  kind  of  event,  its 
internal  state  and  possibly  the  internal  states  of 
other  objects  with  which  it  may  have 
relationships. 

In  specifying  objects  we  distinguish  static  and 
dynamic  aspects.  Static  aspect  concerns  the 
attributes  of  objects  and  the  relationships  betu'een 
classes;  dynamic  aspect  concerns,  the  actions  or 
operation  executed  by  or  on  classes. 

2.1-  Static  Model : 

Our  approach  in  systems  development  is  to  support 
the  ^sterns  analysis  and  other  models  by  means  of 
diagram  techniques  in  order  to  capture  the 
different  aspects  of  information  system.  After  the 
modelling  of  a  pivot  conceptual  model,  it  is 
possible  to  map  the  OOCPM  description  to 
different  database  environments  and  programming 
languages.  Our  aim  is  to  bridge  the  gap  between 
object-oriented  conceptual  modelling  and  object- 
oriented  implementation.  To  do  so,  we  propose  an 
interface  supported  by  a  software  tool  and  based  on 
a  pivot  model  and  a  set  of  mapping  rules.  Then 
that  database  schemes  and  code  can  be  generated 


automatically.  So  the  object-oriented  approach  in 
systems  development  is  applied  in  priiKiple 
without  restrictions  caused  by  implementation 
envirorunent,  whether  it  is  object-oriented  or  rwL 
For  representation  of  the  aspects  of  an  OOCPM  we 
use  diagram  techniques: 

-  Class  relationships  diagram  for  the  static 
structures  (static  representation). 

-  Dynamic  diagram  (dynamic  representation). 
To  represent  the  conceptual  static  structure,  we 
propose  an  extension  of  E/R  model  [Chen76j, 
because  its  diagrams  are  well-krtown,  usually  used, 
have  great  expressive  power  and  used  by  the  most 
object-oriented  methods  (OOSA  [Shlaer  &  Mellor 
91],  OOA  [Coad  &  Yourdan91],  OMT  [Rambau^  & 
al91].  These  methods  use  E/R  model  to  let  the 
designers  and  analysts  who  are  already  familiar 
with  this  model  to  have  the  impression  of  not 
changing  their  habits. 

The  model  that  we  present,  solves  some  of  the 
limitations  of  the  existing  models  (e.g.  the  model 
E/R  take  into  account  the  modelling  of  the 
structural  and  static  aspects  of  a  real-world 
system;  the  aspects  related  to  the  evolution  of 
data  and  the  dynamic  aspects  are  not  tackled). 

The  proposed  model  is  a  conceptual  one,  it 
independed  of  all  implementation  issues.  It  takes 
into  account  not  only  the  static  or  structural  aspect 
but  also  the  dynamic  or  behavioral  aspect. 

It  generalizes  the  object-oriented  conceptual 
models  such  as  the  OOD  |Booch91],  OMT 
IRambaugh  &  al.  91],  OOSA  [Shlaer  k  Mellor  91], 
OOA  [Coad  &  Yourdan91],  for  the  conceptual 
modelling  and  extend  then  to  the  modelling  of  the 
functioning  of  the  methods.  The  order  of  the 
organization  of  constraint  is  not  tackled. 

To  represent  our  model,  we  select  notations  for 
expressing  class  type,  association,  composition, 
roles,  subtyping  (specialization /generalization) 
and  genericity.  These  figures  below  represent  the 
static  relationship  types: 


1  CbKs  ivne  1  _ 

assoaation 
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Kepresentation  of  a<i.<^ation 


Cl>»  type  2 


Where  ci  and  C2  are  cardinality  constraints  and  rl 
and  r2  represented  roles. 
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Where  c  is  a  cardinality  of  composition  link  and  T 


represents  the  type  of  generic  class. 


Association: 

The  associations  establish  relations  between 
objects.  They  are  expressed  between,  classes  and 
th^  may  be  binary  [Rambaugh  &  al91].  They  are 
us^  to  describe  the  conceptual  relation  that  binds 
objects  together.  They  can  possess  their  own 
attributes  and  become  classes  if  some  operations 
are  associated  by  the  designer. 

Cardinality  ratio  and  participation  constraint  are 
expressed  by  structural  constraints  represented  by  a 
pair  of  integer  numbers  (n,  m),  where  n  is  a  min  and 
m  is  a  max,  with  each  participation  of  a  class  type 
C  in  an  association  type  A,  where  0  ^  n  5  m.  This 
means  that  for  each  object  O  in  C,  O  must 
participate  in  at  least  n  and  at  most  m  association 
instances  in  A  always.  This  convention  is  the  same 
as  the  one  introduce  by  Elmasri  [ElmasriQO]. 

The  Roles  represent  a  temporal  behavior  of  object 
IPernici90].  Classes  may  play  several  roles 
simultaneously. 

For  instance,  in  figure  6,  the  object  Order  is 
associated  to  only  one  CUent. 


Composition: 

The  link  of  aggregation  defines  "part-of" 
relationship  between  an  aggregate  object  and  one  or 
many  component  objects.  It  can  be  considered  as  a 
special  link  of  association  but  with  stronger 
cortstraints. 

Notice  that,  two  objects  having  an  independent 
life-cycle  should  not  be  linked  by  a  link  of 
aggregaticm  but  by  an  association  one  [Rumbaugh& 
al91].  The  link  of  aggregation  is  a  current  concept  of 
the  object-oriented  analysis  methods  IManfredi89], 
IHenderson-Sellers91],  [Teisseire91].  It  is  defined 
by  composition  link  in  our  model. 

The  semantics  of  a  composition  link  between  a 
composite  object  and  a  component  object  is  that  the 
composite  object  is  composed  of  the  component 
object,  which  is  strongly  dependent  and  belongs 
exclusively  to  the  composite  object.  A  component 
object  caimot  be  shared  by  or  changed  of  composite 
ot^ecL 

The  composition  link  induces  a  strong  coupling  of 
the  dynamic  characteristics  associated  with  the 
composite  and  the  component  object.  It  is  resumed  in 
the  following  rules : 

The  creation  of  a  composite  object  implies  the 
creation  of  at  least  one  component  object.  The 


deletion  of  a  composite  object  implies  the  deletion 
of  all  its  components.  Creation,  modification  and 
deletion  of  component  objects  may  be  achieved  cmly 
through  the  composite  object. 

For  example,  in  figure  6,  Account  is  a  part  of  Client. 

Inheritance: 

Subtyping  or  the  inheritance  relationship  aims  to 
factor  out  the  structural  properties,  behavioral 
properties  and  the  constraints  common  with 
several  classes  (sub  classes)  in  a  class  in  a  higher 
hierarchy  (super-classes). 

The  inheritance  considered  is  the  inheritance  by 
inclusion  that  has  the  "is-a"  or  "is  like  of 
semantic  between  different  elements: 

Generalized  object  structure  C  Specialized 
object  structure 

Generalized  object  behavior  C  Specialized 
object  behavior 

As  a  consequence,  it  is  a  semantic  link  relating  two 
objects  of  the  classes  on  which  it  is  defined.  It  does 
not  presuppose  the  exclusivity  of  the 
specialization.  Three  types  of  inheritance 
constraints  may  be  defined  with  our  model.  They 
restrict  the  possibilities  of  existence  of  the  objects 
of  several  specialized  classes,  for  each  object  of  the 
generalized  class  [Brunet  91]  : 

Disjunction  constraint:  A  disjunction  constraint 
between  several  specialized  classes  expresses  that 
the  intersection  of  their  extensions  is  empty^ . 


<S>iO<S>jc  »  0,  V  i^k 


For  instance,  the  classes  Car  and  Van  both  inherit 
from  the  class  Vehicle.  The  disjunction  constraint 
implies  that  a  vehicle  is  either  a  car  or  a  van,  or 
something  else. 

Covering  constraint:  A  covering  constraint  between 
several  specialized  classes  expresses  that  the 
union  of  their  extensions  is  equal  to  the  extension  of 
the  generalized  class. 


y  <S>i  =  <G> 


^  Let:  G  be  a  generalized  class  (abstract  or  persistent)  and  S] 
a  specialized  class  i  (a  subclass  of  G).  <G>:  the  set  of  G 
instances.  <S>i:  the  set  of  Sj  instances. 


For  instance,  if  the  classes  Client  and  Supplier 
bo^  inherit  from  Person,  this  constraint  stipulates 
tfvit  each  person  must  be  a  client  or  a  supplier  (or 
both) 

Notice  also,  it  is  possible  to  combine  many 
constraints.  For  instance  the  O*  partition 
constraint  it  is  represented  by  a  disjunction  and  a 
covering  constraint  (i.e.  each  instance  of  the 
generalized  class  is  specialized  in  one  and  only  one 
of  the  specialized  classes.  For  example,  a  person  is 
either  a  man  or  a  woman). 


(«  <S>i «  <G>)  and  (<S>i  r><S>k  «  0.  V  i^) 


With  these  constraints,  we  are  covering  all  the 
types  of  constraints  that  exist  into  the  most 
m^els. 


Generidty: 

The  genericity  provides  a  way  to  parameters 
classes.  A  parametrized  class,  or  so-called  generic 
class  is  a  class  that  is  being  used  as  a  model  by 
other  classes.  It  consists  of  generic  parameters  and 
cannot  be  instanciated  directly.  The  definition  of  a 
parameter's  class  is  derived  from  the  classes 
parametrized,  in  which  it  provides  a  value  to  the 
parameters.  The  notion  of  genericity  is  supported 
by  many  object-oriented  programming  languages. 
For  others,  the  inheritance  can  pratically  support 
any  dting  allows  to  carry  out  the  genericity  [Meyer 
90].  But  the  inheritance  and  genericity 
simultanously  is  considered  as  an  utility  in 
practice  [Booch91].  For  our  model,  typing  would  be 
meaningless  without  the  possibility  of  defining 
generic  classes.  A  generic  class  is  one  that  has  one 
or  more  parameters  representing  types.  A  class 
with  one  generic  parameter  is  declared  under  the 
form 

Qass  <class-name>  [T] 

And  used  by  clients  in  declaration  of  the  form : 

x:  <class-name>  [A] 

Where  A  is  some  type. 


Constraints: 

Some  kinds  of  static  constraints,  which  concern 
relationships  between  two  objects(e.g.  Girdinality 
constraints,  association  constraints)  are  specified 
by  instanciation  of  the  composition  and  association 
concepts.  Other  static  constraints  that  are  local  to 
an  object  rr«y  be  expressed  by  equations  on  the 
scheme  properties  and  associations.  They  are 
specified  in  the  constraint's  item  of  the  class,  as 
illustrated  in  figure  7. 


The  Integrity  constraints  are  specified  to  ensure 
those  attribute  values,  states  and  behavior  of 
objects  in  an  information  system  accurately  model 
corresponding  real  word  object 

Properties: 

A  class  is  composed  of  properties  that  determine 
its  structure. 

A  property  is  strongly  depended  of  the  owner 
object,  and  determines  one  of  its  diaracteristics.  A 
property  may  be  changed  only  by  using  the 
operations  defined  in  the  class.  It  is  defined  by  a 
name  and  a  set  of  values.  It  may  take  its  values 
either  in  a  domain.  The  types  of  data,  or  domains, 
which  are  used  in  our  model  are  diose  which  are 
used  in  the  object-oriented  naodels:  die  pre-defined 
types  (integer,  real,  string,  text,  ....),  Enumerated 
types  defined  by  the  aiuilysts,  and  Ae  pre-defined 
types  to  which  can  be  associated  an  interval  or  a 
set  of  rules.  The  most  well-know  object-oriented 
analysis  methods  today  [Booch91], 
[Coad&Yourdan90],  [Rumbaugh&al91],  [Shlaer 
&Mellor91]  do  not  define  more  complex  data. 
Notice  that,  our  model  supports  the  multiple 
domain  as  the  most  programming  languages. 

2.2*  Dynamic  Model : 

The  elements  to  be  represented  firstly  are  the 
actions.  They  are  executed  in  the  organization  in 
terms  of  management  rules.  These  actions  modify 
the  state  of  the  elements.  They  are  invoked  during 
some  precised  situations:  an  arrival  message 
coming  from  outside,  a  noteworthy  change  of  state 
that  happens  in  the  organization,  or  a  previous 
temporal  situation. 

These  situations  correspiond  to  events.  The  concept 
of  event  is  used  in  several  methods  such  as  IDA 
[DeAntonellisSl],  Remora  [Rolland82],  O* 
[6runet91]. 

Actions  and  events  are  the  two  key  concepts  of 
mrxlelling  of  the  dynamics  of  own  model. 

The  actions  and  the  events  of  the  Universe  of 
Discourse  are  represented  by  the  concepts  of 
operations  and  events  which  are  described  as 
follows : 

Events: 

The  event  concept  [Rolland88],  [Brunet91]  is 
introduced  in  order  to  model  the  dynamic  aspects  of 
the  application  domain.  Operation  expresses  how 
objects  change.  Event  explains  why  t^y  undergo 
changes. 

An  event  occurs  when  a  noteworthy  state  change 
happens  eidier  in  die  environment,  in  one  object  of 
the  information  system  itself,  or  corresponding  to  a 
predetermined  time.  It  triggers  one  or  several 
operations  on  one  or  several  objects.  An  event 
84  stimulated  by  the  environment  is  referred  to  as  an 


external  event,  an  event  due  to  an  object  state 
change  is  called  internal  event,  and  the  third  kind 
of  event  is  called  temporal  event.  An  event  has  a 
name.  Its  definition  includes  a  predicate  part 
which  specifies  its  occurrence  condition,  and  a 
triggering  part  which  specifies  the  operations  to 
be  triggered  with  their  associated  conditions  and 
iterations. 

The  advantages  of  the  event  concept  in  the  object 
definition  are  as  follows  : 

-  All  the  static  and  dynamic  phenomena  are 
specified  into  classes;  object  encapsulation  is 
realized  by  operation  and  events. 

-  behavior  and  operational  dependencies  are 
clearly  specified 

-  Events  lead  to  study  local  situation  fully 
delirroted  by  state  change  of  only  one  object. 

Our  notion  of  event  is  similar  to  the  one  described 
in  O*  model  [Brunet91].  It  is  textually  described 
below  the  keyword  "event". 

Operations: 

The  evolution  of  the  objects  is  effected  by 
executions  of  operations. 

Following  the  encapsulation  principle  of  the 
object-oriented  paradigm,  the  only  way  of 
affecting  (i.e.  creating,  modifying  or  deleting)  an 
object  is  to  execute  an  operation  specified  on  tlie 
class.  Figure  2  illustrate  an  example  of  graphical 
representation  of  OOCPM  operations.  _ 


Crapiiica!  representation: 

f  OKUEK  ^ 

'  .. 

r  Creation  ) 
f  delivery  ) 
t  invoice  ) 

funcellatioiu 

_ J 

Fig  2:  Description  of  OOCPM  operation 


The  execution  of  an  operation  is  always  the  result 
of  an  event  occurrence.  Different  events  may  trigger 
the  same  operation. 

Dynamic  link: 

The  utilization  link  establishes  a  path  allowing 
the  message  reading  of  an  object  to  another  object. 
The  semantic  of  the  utilization  link  is  that  a  client 
object  uses  tiie  services  of  a  ser\'er  or  supplier  object 
It  is  called  the  client-server  link  in  the  OOA 
ICoad90]  model.  In  our  model,  this  link  is 
translated  by  a  dynamic  link. 

The  d)nnamic  relationships  among  objects  are 
explicitly  specified  through  events  and 
simultaneous  triggering  of  operations.  It  is 
important  to  ntake  explicit  these  relationships  at 
conc^tual  model  because  an  important  part  of  the 


real  world  complexity  is  due  to  these  dynamic 
interactions  [Brunet91]. 

Behavioral  constraints 

Behavioral  constraints  are  concerning  the  dynamic 
aspects  of  a  class;  they  restrict  the  possibilities  of 
execution  of  some  operations.  As  for  structural 
constraints  described'  before,  behavioral 
constraints  are  local  to  a  class. 

We  chose  to  specify  the  dynamic  constraints  by  tiie 
way  of  state  transition  graph,  rather  that  by  the 
assertions  of  flrst  order  [Semadas89]  or  temporal 
logic  [Jungclaus91],  for  the  following  reasons : 

A  state  transition  graph  describes  a  local  dynamic 
constraints  at  one  object  (principle  of  localization) 
and  it  provides  a  simple  and  abstract  view  of 
behavior  of  object.  It  facilitates  the  process  of 
specification,  completeness  and  validation  of 
dynamic  constraints  OOD  (Booch91],  OMT 
[Rumbaugh&al.91]. 

States  determine  sets  of  legal  actions  on  objects. 
They  can  be  represented  by  attributes.  State 
transitions  are  caused  by  actions  on  objects.  Actions 
are  caused  by  events.  An  action  induces  the 
transition  of  an  object  from  one  consistent  state  to 
another  consistent  state.  A  state  transition  may  be 
represented  by  a  function; 

5;SxI->S 

Where  S  is  a  set  of  finite  states  and  is  a  set  of 
finite  operations. 

The  function  of  transition  indicates,  when  the 
object  is  in  a  certain  state  and  that  an  operation  is 
invoked.  The  nodes  of  the  graph  are  the  states,  the 
arcs  and  the  labels  are  fixed  by  the  function  of 
transition. 

For  instance,  in  order  to  represent  the  case  of  an 
electric  bulb,  the  operations  are  the  moves  of  the 
interruptor.  There  are  given  as  follows  ; 

S={lighted,  extinct) 

OP=  (invoke  the  interruptor,  release  the 

interruptor) 

so  =  lighted 

5  (extinct,  invoke  the  interruptor)  =  lighted 
3  (extinct,  release  tiie  interruptor)  =  extinct 
5  (lighted,release  the  interruptor)  =  extinct 
5  (lighted,  invoke  the  interruptor)  =  lighted 

This  example  may  be  represented  by  state 
transition  diagrams  as  following  : 

ivleve  Uk  iniemtpior  invoke  the  intenuplor 
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2.3*  Scheme  representation: 

A  class  type  provides  an  unambiguous  description 
of  the  structural  and  behavioral  characteristics 
that  are  conunon  to  a  population  of  objects. 

Each  a  class  type  has  one  or  more  properties  pi, 
which  form  a  set  P  (p/  e  P).  The  value  of  a 
property  remains  unchanged  during  the  life  cycle 
of  the  object  occurrerKe. 

The  state  of  object  occurrence  is  represented  by  an 
attribute  state.  The  different  state  values  st/  of  a 
class  type  form  a  set  St  (sf/  €  St). 

Each  class  type  has  one  or  more  constraints  cn/, 
which  form  a  set  Cn  (cn/  e  On). 

Each  class  type  has  one  or  more  operations  opi, 
which  form  a  set  OP  {opi  €  a).  Each  action  has  a 
valuation,  i.e  one  or  more  rules  stating  how 
attribute  values  are  changed  or  computed  by  the 
execution  of  an  operation. 

Each  class  type  has  on  or  more  events  evi,  which 
form  a  set  Evt  (evi  ^  Evt). 

A  class  type  can  be  described  as  a  tuple  <r,  St,  Cn, 
OP,  Evt>. 

We  give  here  the  definition  of  the  textual 
specification  of  the  base  class  type.  An  extended 
Backus-Naur  Form  is  used  to  specify  the  syntax  to 
be  derived  from  the  conceptual  framework : 


<sLitic  con$traints>  ::  cattribute  constrain(s>  I 
<inheritance  coiistraints>  I  <uniquenes  constraino 
<aitnbuie  C(Mistraints>  ::  <expressioii> 

<expression>  <siinple_expfessioii>  I 

<cofnposeJ_expression> 

<shnple_expression>  ::  <lenn> 
<coinparaison_cqterator>  <tenn>  I  <ttnn> 

<ienn>  <^uribuie*iiame>  I 

OLX>.<at(ribuie-naine>  I  NEW.<auribute-naine> 
<me(hod-naine>  I  <constant-nam^ 
<composed_expression> ::  <siinple_expfession> 
<lo£ical_operator>  <expression> 

<predicaie  >  ::  <exiHession> 

<qierator>  ::  <lo£ic^_operaior>  I  <set_operator> 
<logical_operator>  ::  OR  I  NOT  I  AND 
<setj^raior>  ::  IN  I  UNION  I  INTERSECT 
<compaTaison_pperator>  ::  s  I  >  I  <  I  ^  I  £  1^ 
<inheriiance  constrainis>  ::  <disjunction>  I  <covering> 
<disjunction>  ::  <S>iO<S>j»0,  Vi»^ 

<covering>  ::  H  <S>i  =  <G> 

<uniquenes.<i  constraint> ::  Ul^QUE  {<attribuie>.}-f 
<operation>  ::  <operalion-naine>  <Boby  of 

opera(ion>  l<Type  of  Opmtion>] 

<Type  of  C^raiion>  ;;  PUBLIC  I  PRIVATE  I 
PROTECT 

<event>  ::  <event  namo  <event  typo 

[<predicaie>  I  <ines.sage>]  <trigger> 

^anestfiago  (<parainaiers>}+ 

<trig£er>  ::  <operation  name>  ON  <class 

namo  [{<Facteur>)+)  [(<Condiiion>)+] 


Class  <class-naine>  l<Type>] 

[Inherits  from  (<  superclass-name>)+] 
[Properties 

(<at(ribute-namo:  <attribuie-lypc>}'f] 
[States 
(<.staio)-«-] 

[Associations 

(<associa(ion>}-«-] 

[Constraints 

(<staiic  ccHi.straint>)+] 

[Operations 

{<apenition>]-t-] 

[Events 

(<evenl>)+l 

End  —  class _ _ 


With: 

<Typo  ::  The  generic  parameter 

<mribute^type>  ::  <basic_domain>l 

<coIlection_domain>  I  <aggregaie_domain>  I 
<enuinerated_domain>  I  <d»naine-inier\’ale> 
<basicji(mam>  ::  integer  I  real  I  date  Isuing  I 

boolean ... 


<coUeaionJbmain> 

<emmeraiedjhtmm> 

<iraerva}Jbmain> 

<aggregate_fhniain> 

(<aggregaie-naine-class>) 

<association>  ::  ASSOCIATION 

<association'name>  OF  <class- 1  -name> 
ICanbnin,Cardniax],  <clas.s-2-name>  [Car{lmin,Cardmaxl 


::  SET  OF  (<class-namc>) 

::  ENUMERATED  ( I  value }+) 
::  [min..max] 

::  AGGREGATE  OF 


::5:SxZ->S 


The  following  figure  presetits  a  synoptic  of  the  use 
of  different  concepts  in  several  representative 
methods  of  Object-oriented  analysis  and  design. 
Some  methods  focus  on  the  static's  characteristics 
(OOSA,  OOA,  OMT,  MCO),  others  on  dynamic 
ones  (OOD).  Our  model,  OOCPM,  takes  into 
account  the  two  aspects  (static  and  dynamic). 
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3-  Mapping  of  CXXTM 

3.1-  General  Overview 

OOCPM  aims  at  assuring  a  mappings  guided  by  a 
software  tool,  from  any  conceptual  specification 
towards  an  implementation.  The  interface  consists 
of  an  object-oriented  pivot  model,  Rc  rules  for  the 
mapping  from  the  conceptual  model  to  pivot  model 
and  Ri  rules  from  the  pivot  model  to  object- 
oriented  implementation  (language  and 
persistence). 

The  figure  below  illustrates  the  pivot  model: 


Fig  4:  OOCPM 


Where  Rc  rules  are  used  to  transform  the  user's 
conceptual  schema  inU)  pivot  model.  Ri  rules  are 
appli^  to  lead  to  an  object-oriented  programming 
environment  and  Rr  rules  are  used  for  refinement. 

3.2- The  Mapping  rules  between  the  Object- 
oriented  aiulysis  models  and  OOCPM 

The  conversion  of  a  given  conceptual  schema  (a 
result  from  a  certain  modelling)  to  a  standard 
software  architecture  with  the  help  of  the 
concepts  of  our  pivot  model  is  carried  out  in  the 
following  stages: 

Firstly,  the  conceptual  schema  is  transformed  into 
an  OOCPM  by  the  application  of  the  very  precise 
transformation  rules  that  assure  the  tracability 
between  a  given  modelling  and  a  standard  generic 
derign. 

Secondly  the  results  of  this  transformation,  by 
applying  a  set  of  rules  are  transformed  into  a 
language  or  an  environment  (design  product)  that 
will  be  ready  to  be  implemented. 


3.2.1-  Transformation  rules  of  the  conceptual 
schema 

The  transformation  of  a  conceptual  schema  (i.e. 
the  product  of  modelling)  into  elements  of  die 
OOCPM  is  presented  below.  This  study  is  based  on 
the  6  well-know  conceptual  models  (OMT,  OOA, 
OOD,  OOAD,  MCO,  O*): 

Note  also  that  the  conversion  into  the  corKeptual 
pivot  model  may  be  as  follows : 

-  mapping  simple  (1-1);  it  is  represented  by  a 
function:  m(Cs)=  Cp 

-  grouping  (n-1);  it  is  represented  by  a  function: 

gKCs)l=  Cp 

-  retyping  (1-1);  it  is  represented  by  a  function: 
r(Cs)  =  Cp 

-  explosing  (1-n);  it  is  represented  by  a  function: 
e{Cs)=|Cpl 

where  Cs  is  a  source  concept  and  Cp  is  a  pivot 
concept 

To  transform  the  user's  object-oriented  conceptual 
schema,  the  following  Rc  rules  must  be  used: 

Rc.Class:  Each  source  entity  or  source  class  is 
transformed  into  a  pivot  class  in  which  the  visible 
attributes  are  the  properties  which  characterize 
the  corresponding  class. 

For  instance, 

m(0*  class)  =  Pivot_class 
m(MCO  class) »  Pivot_class 
m(OMT  class)  =  Pivot_class 
mfOOA  class)  =  Pivot.class 

Rc_Abst_Class:  Each  source  abstract  class  or  actor 
class  is  transformed  into  a  pivot  abstract  class  (a 
class  without  instance  variable). 

For  instance, 

m(0*  actor  class)  =  Pivot_abstract_class 
m(MCO  abstract  class)  =  Pivot_abstract_class 

Rc.Attributes:  Each  attribute  of  type  property  is 
translated  into  attribute  property  pivot  according 
to  their  types  (predefined,  LIST  OF,  SET  OF, 
ENUMERATED,  AGGREGATE  OF ...). 

Rc.Att.Hist:  Each  variable  attribute,  that  can  be 
historised,  requires  to  memorize  all  the  state 
changes,  is  transformed  into  aggregate  attribute  in 
which  is  specified  the  temporal  aspect  (date, 
hour). 

r(Attribute_Hist)  =  Aggregate_Att 
For  instance.  To  keep  all  the  client  addresses, 
during  all  their  lives: 
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Class  client 
Properties 

address:  AGGREGATE  OF  (ad;  AGGREGATE  OF 
(street;  string,  nuinher;  integer,  city: 
_ strine,  country:  string),  d;  DATE) 

Rc_Stat_Const:  All  source  concepts  of  type  static 
constraints  are  translated  into  a  Static  Constraints 
pivot  according  to  their  nature  (uniqueness  or 
attribute). 

m(0*  Uniqueness  Constraint)  =  Pivot 
Uniqueness  Constraint 

ni(0*  Attribute  Constraint)  =  Pivot  Attribute 
Constraint 

Rc_Ass_Class:  The  notation  of  an  associative  class 
type  was  adopted  from  OMT  [Rambaugh&al91] 
OOSA  [Shlaer&Mellor91],  OORM(Hwang901, 
EROOS  |Baelen92].  An  associative  object  may 
have  attributes,  life  cycle  and  a  role.  The  concept 
of  associative  entity  corresponds  to  composite 
entity  as  it  was  introduced  by  Chen  [Chen85]  and 
used  by  the  OMT  and  OOSA  by  using  the  concept 
associative  class. 

This  type  of  class  will  be  transformed  into  a  pivot 
class  by  specifying  the  links  of  association  with 
the  test  of  the  classes  implied. 

Every  source  associative  class  (for  example:  OMT, 
OOSA)  is  transformed  into  a  pivot  class  in  \vhich 
the  attributes  are  those  from  the  associative  class. 
For  instance, 

r(OOSA  associative)  =  Pivot  class 
r(OMT  associative)  *  Pivot  class 

Rc_Ass_RelationShip:  Every  source  relationship 
of  type  (n-m),  which  is  accompanied  by  properties 
is  transformed  into  a  pivot  class.  The  attributes  of 
this  class  are  the  properties  which  characterize 
the  association  type.  The  identifiers  of  the  classes 
or  entities  implied  in  the  association  are  added 
into  the  properties  of  the  pivot  class. 

For  instance, 

m(OOSA  association)  =  Pivot  association  type 
m(Henderson_SelIers  association)  =  Pivot 
association  type 

in(OMT  assdciation)  =  Pivot  association  type 

Rc_Agg_Link:  Every  source  aggregation  link  is 
transformed  into  a  pivot  composition.  This  link 
will  be  simple  or  multiple. 

For  instance, 

r(OMT  aggregation)  =  Pivot  composition  link 
r(CX>A  aggregation)  =  Pivot  composition  link 
r(BON  aggregation)  =  Pivot  composition  link 

Rc_Comp_Type:  Every  source  association  link  in 
which  C2  =  (1,1)  is  transformed  into  a  pivot 
composition  link  according  to  the  cardinality  ci: 
if  Cl  €  K0,1),  (1,1)1  =>  Simple 
if  Cl  €  |(0,N),  (1,N)J  =>  Multiple 


Rc_Sem_Link:  The  rest  of  each  source  semantic 
link  is  transformed  into  a  pivot  association  link  by 
specifying  the  characteristics  between  the  implied 
classes.  For  instaiKe, 

r((X)D  Utilization  link)  =  Pivot  associatkm  link 
rfOOSA  Utilization  link)  =  Pivot  association 
link  ' 

r(BON  Utilization  link)  s  Pivot  association  link 

Rc_Inheritance_Link:  Every  simple  or  multiple 
source  inheritance  link  is  transformki  into  a  simple 
or  multiple  pivot  inheritance  link  (i.e.  inheritaiKe 
by  specialization)  by  specifying  the  constraints  to 
restrict  the  possibilities  of  existence  of  dte  objects 
of  several  specialized  classes,  for  each  object  of  a 
generalized  class,  if  necessary.  For  instance, 
r(OMT  Inheritance  link)  s  Pivot  inheritance 
link  (with  a  disjunction  constraint) 
r(C)OSA  Inheritance  link)  =  Pivot  inheritance 
link  (with  a  disjunction  cor^straint) 
nn(C)*  Inheritance  link  (disjunction  constraint)) » 
Pivot  inheritance  link  (with  a  disjunction 
constraint) 

m(0*  Inheritance  link  (covering  constraint))  s 
Pivot  inheritance  link  (with  a  covering 
constraint) 

m(0*  Inheritance  link  (partition  constraint))  = 
Pivot  inheritance  link  (with  a  disjunction  and 
covering  constraints) 

Rc.Genericity:  Every  source  genericity  corKept  is 
transformed  into  a  genericity  pivot  by  specifying  a 
type.  For  instance, 

m(CX!)D  genericity)  =  Pivot  genericity  link 

A  same  example  is  described  here  after  using  the 
two  source  m^els,  O*  and  MCO  and  it's  translated 
in  the  CXXZPM. 

In  this  O*  example,  we  note  a  covering  constraint, 
each  person  must  be  a  client  or  a  supplier  (or  both). 
In  the  MCO  example,  this  constraint  is  represented 
by  an  abstract  class  Person  (non  instanciable) 
whereas  Client  and  Supplier  are  persistent  classes. 
Lower  level  classes  may  be  created  to  represent 
clients  who  are  suppliers  at  the  same  time. 
Discussion  about  creation  of  such  classes  can  be 
found  in  [Castellani93]. 

To  represent  static  links  between  objects,  in  O*  one 
way  arrow  is  used  where  as  in  MCO  double  way  is 
required. 
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Fig  5:  An  MCO  and  0*  graphical  descriptions  of  the 
stalk  relationships  between  classes  (Source  Models) 


Fig  6:An  OOCPM  graphical  descriptions  of  the  static 
relationships  between  classes  (Pivot  Model): 

The  dynamic  concepts  (operation,  event,  state 
transition  graph,  service,  actor  ...)  are  mapped 
using  dte  following  rules : 

Rc_Operation_Action:  All  source  concepts  of  type 
operation,  action  or  service  are  translated  into  an 
operation  pivot 

Some  models  describe  operations  by  a  text  in 
natural  language  (example  O*,  MODWAY 
[Giuvet93].  This  text  specifies  the  operation 
purpose  and  tiie  rules  according  to  which  attributes 
and  states  are  valued  or  changed.  In  this  case,  the 
designer  has  to  give  his  algorithm  details  using, 
when  needed,  the  classical  instructions 
(IF...THEN...ELSE...ENDIF,  WHILE....END,...). 
If  the  modd  uses  a  formal  specification  language, 
the  translation  will  be  automatically  done  into  a 
pivot  model  specification  language.  For  instance, 
r(0*  operation)  =  Pivot  Operation  (  with  a 
specification  of  the  body  of  operation) 


For  instance,  an  order  creation  operation  is 
translated  into  a  000*M  as  follows: 

Clast  ORDER 
Praperliet 

eresttioH-daie:  DATE 
Delivery_daU:  DATF 
Invoicedjieae:  DATE 
Sates 

(created,  delivered,  invoiced! 

S(SO,  Create jOrder)*creaied 
S  (created.  Delivery)*  delivered 

S  (delivered,  invoicing)*  invoiced 
Assoeiations 

ASSOCIATION  Ordjine  OF  Order  UM 
Order JLine  (OJ^J 

ASSOCIATION  CltjOrd  OF  Order  (UI.  Oient 
lOJt) 

Constraints 

creation_date  S  delivery_date 
delivetyjdate  S  invoice jiate 
Operation 

CreateJOrder:  PRIVATE 

(Var:  number,  ord_line,  create_date,...) 

Precondition:  absent  order 

Body 

I  Create jOrder_Line()J + 

—  cre0e  one  instance  of  order 
EndBody 

Postcotidition:  state  *  'created' 

End 

Fig  7:  An  example  of  operations 

Rc.Event:  Each  source  event  type  is  transformed 
into  an  pivot  e\’ent  depending  on  its  type  (internal, 
external  or  temporal).  For  instance, 
r(OMT  Event)  =  Pivot  Event  (According  to  types) 

Rc_Int_Event;  Each  source  internal  event  is 
transformed  into  an  internal  encapsulated  event  in 
the  corresponding  class  ascertained.  For  instance, 
m(0*  Internal  Event)  =  Pivot  Internal  Event 

Rc_Ext_Event:  Each  source  external  event  is 
transformed  into  an  external  pivot  event.  The 
attributes  of  this  source  event  type  are  constitute 
the  corresponding,  external  message.  For  instance, 
m(C>*  External  Event)  =  Pivot  External  Event 
i^MCO  Event  Model  Object)  =  Pivot  External 
Event 

Rc_Temp_Event:  Each  source  temporal  event  is 
transformed  into  a  temporal  pivot  event.  For 
instance, 

m(0*  Temporal  Event)  =  Pivot  Temporal  Event 

For  example  the  product  event  (out  of  stock)  can  be 
translated  into  a  OOCPM  as  follows  : 


89 


CtasiPSDDUCT - 

Opermtinu 

Eventi 

out  of  ttodr.  inlenul 
pnaieate 

(OLD.qte^stoek  It  replenishmcnt^lcvel)  and 
NEWjfUjiiock  <  replenishment Jnvl) 

trigger* 

create  ON  Supplier.  Order 

fteS^^tTTxampt^T^TH^Texiuancscription^^THe 

oSt_Stock 

3.3*  The  Mapping  rules  between  OOCPM 
and  target  languages 

The  second  set  of  rules  is  used  for  the  translation 
from  an  OOCPM  specification,  already 
established  before,  towards  a  target  object- 
oriented  implementation. 

Ri.Class:  Each  pivot  class  is  translated  into  a 
class  within  the  target  language. 

Ri_Abst_Class:  Each  pivot  abstract  class  is 
translated  into  an  abstract  class  or  deferred  class 
(a  class  without  instance  variable  in  OOPL). 

Ri.Inherit:  The  'Inherits  from'  concept  is 
translated  into  a  classical  inheritance  into  target 
languages.  To  resolve  the  multiple  inheritance 
conflict,  REDEFINE  and  RENAME  can  be  used  in 
the  target  language. 

Ri_Basic_Dom:  All.  object-oriented  programming 
languages  support  the  <basic_domain>  notion. 

Ri_CoIlection_Dom:  Each  <collection_domain>  is 
translated  using  the  generic  class  COLLECTION 
IX]  or  into  a  collection  SET  (X),  where  X  is  a  type. 

Ri-Aggegate_Dom:  Each  <aggregate_domain>  is 
translated  using  an  abstract  class  into  target 
language. 

Ri_Enam_Int_Dom:  Each  <enumerated_domain> 
or  <interval  _domain>  is  translated  using  the 
mediod  or  routine  into  target  language. 

To  control  state  transitions  starting  from  the  same 
state  and  for  die  same  event  is  by  specifying  a 
precondition.  It  is  possible  that  an  object  remains  in 
the  same  state  as  it  was  before.  To  avoid 
redundancy,  conditions  are  given  as  static 
(attribute,  state,  occurrence,  etc.)  constraints  if 
possible,  if  it  is  not  possible  to  specify  constraints 
for  state  transition  as  static  constraints,  a 
precondition  may  be  specified. 


with  method  or  routine  (precondition  and 
postcondition  are  mandatories). 

Ri.Static.Const;  Each  uniqueness  or  attribute 
constraint  is  translated  into  an  invariant  or  using  a 
specific  method. 

Ri_Static_Link:  All  conc^ts  of  type  composition 
or  association  link  are  trarulated  witih  aggregated 
attributes  and  cardinality  constraints  in  the 
constraint  part.  In  the  case  of  a  strongly 
dependency,  cardinalities  must  be  defined  in  two 
classes,  the  caller  and  called.  However,  in  die  case 
of  a  weak  dependency,  the  cardinalities  are 
expressed  only  in  the  caller  objects.  Each 
cardinality  constraint  is  translated,  into  target 
language,  within  a  specific  method  which  verifies 
the  minimal  and  maximal  cardinalities  (in  the 
caller  and  the  called) 

For  instance,  the  translation  of  the  static  aspect  of 
the  pivot  conceptual  specification,  given  in  Figure 
9  and  10  ,  towards  Eiffel  and  ONTOS/C++ 
programming  environments  is  as  follows: 


Class  PERSON 
Properties 
Nss  :siring(lS) 

Name:  siring(.tO) 

Age:  10..W) 

Address:  AGGREGATE  OF( num.-integer,  street: 
siring(20),  eiiy:  siring(25).  country:  string  (16)} 
Constraints 
UNIQUE  (Nss) 

Client  CxSupplier  *  Person 
Fig  9:Art  example  of  OOCPM  textual  specification 


Ri.State.Trans:  All  state  concepts  are  translated 
with  enumerated  attribute  (called  STATE)  and 
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fig  10:  An  example  of  Ontas/C**  and  Eiffel  clafs 


v^ere  Object  is  a  predefined  ONTOS  class.  Each 
persistent  object  must  be  an  instance  of  the  Object 
class  or  its  derived  classes. 

The  inherit^ince  constraints  are  mapped  in  object' 
oriented  programming  using  the  following  set  of 
rules: 

Ri_Inherit_Disj:  Each  inheritance  constraint  such 
that  S>trt<S>k  s  0.  V  i^k  is  translated  by  a 
classical  inheritance  into  the  target  languages  (all 
classes  are  persistents). 

Ri.lnhexit.Cov:  Each  inheritance  constraint  such 

that  ^  <S>i=<C>  is  translated  by  a  abstract 
superclass  and  the  set  of  subclasses  all  persistents. 

persistent  classes  will  be  created.  They 
represent  all  possible  intersections  between  the  n 
subclasses.  To  resolve  the  multiple  inheritance 
conflict,  in  the  object'Oriented  language,  these 
classes  will  be  virtual  in  die  C++  languages. 

Dynamic  aspects  of  an  object  are  op>erations  and 
events.  Operations  are  represented  by  methods  in 
object-oriented  programming.  Operations  may 
retrieve,  change,  or  delete  values  of  attributes, 
change  the  state  of  an  object,  set  up  or  terminate  a 
relationships  with  other  objects.  The  OOCPM 
operation  concept  is  mapped  as  follows : 


R!_Operation:  Each  instance  operation  is 
translated  by  a  specific  method  into  die  target 
language.  Pre  and  Post  conditions  are  checked  into 
the  method  body. 

The  meaning  of  the  event  concept  is  not  the  same  in 
different  conceptual  mddels.  This  concept  is 
particularly  hard  to  implement  in  object  oriented 
languages,  because  of  ^  functional  principles  of 
the  method  calls  [Kraiem92].  Fdr  instance,  in  O*,  it 
poses  some  problems  such  as: 

-  the  implementation  of  the  internal  event 
mechanism 

-  the  management  of  the  dynamic  transition 
during  die  execution 

•  die  saving  of  the  event  succession. 

To  resolve  diese  problems,  we  propose  a  solution 
based  on  two  steps: 

-  when  event  is  activated,  its  operations  are 
triggered  and  its  predicates  -  susceptible  to  be 
chained  -  are  tested 

•  then,  each  event  having  a  true  predicate  is 
activated  in  sequence. 

To  implement  this  mechanism,  we  use  the 
following  rules : 

Ri_Int_Event:  An  internal  event  is  translated  by  a 
private  method  for  the  object. 

Ri.Ext.Temp.Event:  Each  external  or  temporal 
event  is  translated  by  an  abstract  class  and  a 
method  for  its  execution. 

R!_Event_Melhod:  Every  event  method  is 
implemented  by  a  routine  in  the  target  language. 

Ri_Event_Pred:  For  every  private  event  method,  a 
specific  method  (TEST_PRED)  is  implemented  in 
order  to  test  predicate.  A  Boolean  parameter  is 
used  when  calling  the  method.  When  the  call  has 
a  factor,  we  must  keep  the  predicate  value  for  each 
affected  object. 
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Fig  11:  Eiffel  implementation  of  an  pivot  event. 

A  detailed  algorithm  is  presented  in  the 
[Kraiem92,93]  for  the  mapping  of  events  into 
programming  languages. 

4-  Conclusion 

OCXIPM  solves  some  of  the  limitations  of  the 
existing ‘models.  It  is  a  generic  conceptual  model 
combining  together  die  static  and  dynamic  aspects. 
Coupling  this  models  with  an  object-oriented 
environments  allows  to  extenuate  some  of  their 
limitations. 

We  introduced  rules  that  allow  the  formulation,  at 
an  in  high  of  abstraction  of  program  architecture 
according  to  principles  of  object-oriented  analysis, 
design  and  programming. 

For  our  future  works,  we  address  the  problem  of 
reusing  generic  pieces  of  the  Requirements 
Engineering  Process  which  are  called  Process 
Chunks  .  This  notion  will  be  introduced  in  our 
framework. 

A  process  chunck  is  a  piece  of  generic  knowledge 
reusable  for  requirements  engineering  issues  of  the 
same  kind.  It  is  in  the  form  <situation,  decision, 
argument,  action>. 
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The  System  Engineering  Tedinology  Interface  Spedllcation  (SETIS):  An  Update 
L  Introduction 

The  Design  Structuring  and  AUocation  Optimization  (DeStinAtiOn)  project  is  a  research  ef¬ 
fort  sponsored  by  the  Naval  Surface  Warfare  Center  (NS  WC)  that  attempts  to  provide  systems  engi¬ 
neers  vidth  various  types  of  tools  and  techniques  to  perform  design  optimization  tuid  tradeoff  analy¬ 
sis  [HoNH],  [NgHo].  A  ^ical  scenario  that  the  DeStinAtiOn  project  hopes  to  address  is  that  of  a 
systems  engineer  who  wants  to  change  nonfunctional  requirements  (referred  to  as  System  Design 
Factors  [SDF])  such  as  modifying  a  constraint  in  his  design  and  observing  how  his  changes  affect 
reliability.  In  other  words,  he  wants  to  perform  a  tradeoff  analysis  between  the  system  design  factors 
of  performance  and  reliability  [NgHo].  The  DeStinAtiOn  prototype  attempts  to  arrive  at  a  new 
methodology  in  the  area  of  design  optimization  and  tradeoff  analysis. 

The  project  attempts  to  integrate  a  number  of  tools  and  techniques  that  will  facilitate  both  the 
design  and  tradeoff  analysis  aspects  of  systems  engineering.  This  immediately  necessitates  the  ex¬ 
change  of  information  between  the  different  tools  being  employed.  The  Systems  Engineering 
Technology  Interface  Specification  (SETIS)  is  an  approach  to  facilitating  information  exchange 
across  various  tools  being  employed  in  DeStinAtiOn  in  particular  and  systems  engineering  in  gener¬ 
al.  SETIS  attempts  to  incorporate  similar  information  exchange  standards  that  are  evolving  in  the 
CASE  industry.  The  CASE  Document  Interchange  Format  (CDIF)  is  currently  the  leading  standard 
for  the  import  and  export  of  information  between  different  CASE  tools.  CDIF  is  presently  oriented 
primarily  toward  software  engineering.  However,  SETIS  extends  the  technique  to  include  enhance¬ 
ments  for  systems  engineering  information.  The  intent  is  to  maintain  compliance  with  CDIF.  This 
document  will  provide  an  update  on  the  current  status  of  research  on  SETIS. 

n.  SETTS:  An  Overview 

The  primary  goal  of  SETIS  is  to  provide  a  standard  for  information  exchange  between  vari¬ 
ous  tools  employed  in  systems  design  capture,  analysis,  and  simulation  with  design  optimization 
tools  and  techniques.  The  following  components  of  systems  design  are  examples  of  the  tools  and 
techniques  that  interface  through  SETIS: 

1.  Front-end  CASE  tools  for  capturing  analysis  and  design. 

2.  System  behavior  modeling  tools  for  analysis  and  simulation. 

3.  Optimization  algorithms  such  as  scheduling,  resource  allocation  and  design 
structuring . 

The  various  information  categories  that  SETTS  covers  is  illustrated  in  the  following  ffgure: 
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We  will  briefly  describe  the  information  categories  and  their  implications.  (For  further  in¬ 
formation  refer  to  an  earlier  paper,  [LLPL]).  The  conceptual  model  captures  parameters  that  arise 
both  firom  the  operational  environment  and  from  information  models  [Kare,  Hoan].  Through  the 
conceptual  model,  the  systems  engineer  and  the  customer  can  arrive  at  an  understanding  of  the  sys¬ 
tem  being  designed. 

The  logical  model  visualizes  the  system  from  the  perspective  of  functional  and  behavioral 
models  without  paying  particular  attention  to  the  implementation  methodology.  This  model  thus 
emphasizes  what  should  be  done  by  the  system  and  not  how  it  should  be  done.  The  implementation 
model,  on  the  other  hand,  specifically  addresses  the  *'how”  part  of  the  system.  It  envisages  the  vari¬ 
ous  hardware  and  software  components  that  are  required  to  provide  the  desired  functionality  of  the 
system.  The  mapping  model  contains  pairs  that  derignate  how  objects  are  allocated  within  and  across 
models.  Two  mappings  that  are  of  particular  interest  to  the  research  are  the  logical  implementation 
model  mapping  and  the  mapping  of  software  onto  hardware.  Moreover,  it  also  accounts  for  various 
human  factors  that  may  affect  the  operation  of  the  system.  Objects  within  any  of  these  models  may 
be  attributed  by  System  Design  Factors. 
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As  mentioned  earlier,  SEUS  provides  guidance  and  captures  information  on  different  types 
of  tools  and  technqiues.  The  Tool/Technique  Characterization  describes  the  software  packages  and 
^proaches  available  (both  commerically  and  within  the  public  domain)  that  may  be  applied  for  cap¬ 
ture,  analysis,  simulation,  optimization  and  assessment,  as  well  as  techniques  that  may  be  used  in  the 
^stem  life  cycle,  and  also  justifies  their  applicabili^  based  on  the  system  requirements  or  specifica¬ 
tions. 


C-H-  has  been  used  to  describe  class  hierarchies  within  these  models  along  with  data  struc¬ 
tures  and  operations.  As  an  example,  operations  include  procedures  for  object  creation  and  destruc¬ 
tion  and  for  reading  and  writing  to  an  ASCII  file  on  disk. 

in.  Use  of  SETTS  with  FE)-CASE  Systems 

Several  Front-End  Computer  Aided  Software  Engineering  (FE-CASE)  tools  are  currently 
popular  in  the  real-time  systems  community.  Cadre’s  Teamwork,  IDE’s  Software  through  Pictures 
(StP),  and  Mark  V  Systems’  ObjectMaker  are  ready  examples.  The  systems  design  engineer  may 
use  these  FE-CASE  tools  to  spe^y  portions  of  the  system.  In  particular,  the  systems  engineer  fol¬ 
lows  a  methodology  and  uses  the  icons  that  the  FE-CASE  graphics  provide  to  indicate  the  various 
components  of  the  system  and  their  interconnections. 

To  perform  design  optimization  with  DeStinAtiOn  the  information  captured  within  the  FE- 
CASE  tool  must  be  imported  into  DeStinAtiOn.  To  accomplish  this  in  a  standard  way  for  a  diverse 
set  of  tools  and  information,  the  transfer  approach  put  forth  by  the  Case  Document  Interchange  For¬ 
mat  (CDEF)  Committee  has  been  used.  A  picture  of  the  transformations  necessary  to  import  the  FE- 
CASE  information  into  SETIS  is  shown  in  Hgure  2. 
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Figure  2.  Dtsign  Capture  IntcrfiMe. 


There  is  additional  information  required  by  optimization  techniques  beyond  what  is  captured 
within  the  FE-CASE  tools.  Namely,  information  regarding  hardware  architecture,  constraints  (e.g., 
timing  and  placement)  and  Systems  I>esign  Factors  (SDF).  Remember  that  the  SDFs  capture  the 
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nonfiincdonal  requirements  of  a  system  and  are  the  subject  of  u^deoff  analysis.  For  demonstration 
purposes  we  have  constructed  separate  windows  that  are  independent  of  the  CASE  system  for  enter¬ 
ing  this  additional  information.  This  is  shown  in  the  upper  right  of  Bgure  2. 

SEnS  must  now  integrate  the  FE-CASE  dependent  information  and  the  GUI-based  in¬ 
formation  into  a  tool  independent  data  set  that  the  simulation  and  optimization  algorithms  can  use. 
The  FE-CASE  dependent  information  can  now  be  put  out  as  a  common  FE-CASE  independent 
SEUS  will  integrate  the  FE-CASE  dependent  information  and  the  corresponding  FE-CASE  inde- 
pendentsystem  information  into  a  single  data  set  (currently  envisaged  as  one  file)  that  complies  with 
CDIF  standards. 

CDIF  envisages  the  design  as  moving  through  several  levels  of  abstraction.  The  Meta-Meta 
Model  specifies  the  system  at  the  highest  abstraction:  there  are  Meta-Objects  with  Meta-Attributes 
and  Meta-Entities  with  Meta-Relationships.  At  the  Meta-Model  level,  the  Meta-Objects  and  their 
attributes  become  more  specific  although  the  objects  and  relationships  are  still  abstract  The  system 
components  and  their  relationships  become  most  clear  at  the  Model  level.  This  hierarchial  abstrac¬ 
tion  is  exemplified  in  the  two  figures  (Figures  4  and  5)  shown  for  the  Software  Structure.  The  Soft¬ 
ware  Structure  is  a  component  of  the  Implementation  Model  within  SEUS  which  was  shown  in  Fig¬ 
ure  1  discussed  earlier. 
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Figure  4.  Meta  Model  for  Software  Structure. 


Tlie  desi^  is  written  out  to  a  file  in  what  is  called  a  ”Qear  Texf  ’  format  This  file,  which  is  in 
ASCII,  can  be  read  by  any  other  tool  that  understands  CDIF  Clear  Text  The  model  can  thus  be  recon¬ 
structed  for  the  tool  to  operate  on.  Our  current  effort  has  been  to  construct  the  various  CDIF  models 
for  the  Implementation  Model  that  was  shown  in  Figure  1. 

We  have  been  working  on  a  C-h-  class  library  complete  with  data  structures  and  operations 
for  the  model  thatis  discussed  in  the  next  section.  Users  can  add  their  own  functions  beyond  the  ones 
available  in  the  class  library  to  perform  optimization  or  simulation  of  the  design. 

in.  C-f^  Class  Design  for  ttie  SETIS  Imfdemaitation  Modd 

The  Implementation  Model  will  be  used  as  an  example  to  illustrate  the  C-h-  class  library  de- 
dgiL  In  tlm  model,  the  primary  components  are  the  Software  Architecture,  the  Hardware  Architec¬ 
ture  and  the  Moping  that  relates  these  two  structures  (see  Hgure  1 ).  Hgutes  6. 7  and  8  show  graphics 
of  tire  current  C++  class  libraries  for  each  of  these  components. 

Software  is  visualized  as  a  set  of  interacting  software  modules.  Each  module  comprises  a 
number  of  tasks  or  submodules.  We  structure  the  entire  software  architecture  as  a  gn4)h  where  each 
node  is  either  a  module  or  a  task,  and  each  edge  represents  an  interaction  (data  or  control  transfer) 
between  the  nodes.  The  edges  are  further  classified  depending  on  whether  they  connect  tasks  in  dif¬ 
ferent  modules  or  tasks  in  the  same  module,  and  also  depending  upon  the  direction  of  the  interaction 
(entry  and  exit).  Each  node  and  edge  has  a  unique  (name,  id)  pair  that  identifies  the  software  struc¬ 
ture.  The  "primitive”  subclasses  have  been  added  to  facilitate  future  additions. 

The  hardware  structure  is  almost  analogous  to  the  software  structure;  here  too,  the  hardware 
configuration  is  visualized  as  a  graph  with  each  host  being  a  node  and  physical  communication  links 
between  hosts  being  edges.  The  communication  aspects  of  the  link,  however,  neccesitate  a  slightly 
different  model  There  are  various  communic  ition  subsystems  that  incorporate  intelligent  software 
into  their  basic  hardware  to  achieve  some  required  functionality.  Examples  of  these  are  intelligent 
packet  switchers,  which  are  primarily  communication  oriented  hardware  devices,  but  also  include 
packet  routing  software.  Such  communication  links  have  unique  properties  and  require  a  special 
class  called  intelligent  links.  All  links  can  also  be  classified  further  depending  upon  whether  they  are 
dedicated  point-to-itoint  communication  paths  between  two  hosts,  or  whether  they  can  be  used  by 
multiple  hosts. 

Finally,  the  mapping  structure  incorporates  the  various  placement  and  time  constraints  and 
also  SDF  constraints  (e.g.  reliability  requirements,  specified  performance  requirements,  etc.)  ^ci- 
ffed  by  the  system  designer,  and  arrives  at  a  miq>ping  view  that  determines  which  software  should  run 
on  which  hardware  and  at  what  time. 

These  are  only  some  of  the  factors  that  affected  the  design  of  the  C++  class  hierarchy  for  the 
Implementation  Model.  Figures  6  and  7  provide  the  class  structures  in  greater  detail. 
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1.  INTRODUCTION 

The  complexity  and  sheer  magnitude  of  modem  intensive  systems  (including  advanced 
combat  sensors,  and  weapons  systems)  require  a  disciplined  structured  approach  for  development. 
The  principal  objective  of  the  system  development  process  is  to  establish  a  design  which  satisfies 
the  system  requirements  and  constraints  while  optimizing  the  key  tradeoffs  and  issues  associated 
with  system  functionality,  behavior,  and  implementation.  A  multi-domain  design  capture  and 
analysis  methodology'  has  been  developed  under  the  Office  of  Naval  Research's  Engineering  of 
Complex  Systems  (ECS)  Technology  Block  which  partitions  the  system  design  into  five  Design 
Capture  Views  addressing  the  following  principal  design  perspectives:  (1)  Environmental,  (2) 
Informational,  (3)  Functional.  (4)  Behavioral,  and  (5)  Implementation.  Of  these  five  capture  views, 
the  implementation  Capture  View,  which  addresses  the  physical  systems'  architectures  (including 
hardware,  software  and  humanware),  is  very  crucial  for  the  evaluation  of  the  design  and  yet,  is  not 
efficiently  addressed  by  existing  system  design  methods  and  tools. 

An  important  aspect  of  the  Implementation  Capture  View  is  the  method  for  capturing 
system  resources.  The  ability  to  capture  and  analyze  system  resources  early  in  the  design  process 
supports  the  understanding  of  how  resources  are  utilized,  resutting  in  a  major  cost  reduction  in 
system  integration.  The  resource  capture  method  needs  a  robust  and  flexible  mechanism  for 
characterizing  system  resources  including  hardware,  software,  and  human  operators.  The 
mechanism  must  also  allow  the  manipulation  of  system  resources  such  that  alternate  designs  can 
be  traded-off  in  achieving  system  requirements.  The  design  of  large-sized  and  complex  system 
requires  the  analysis  of  the  logical  (functional)  as  well  as  the  physical  (resource)  aspect  of  the 
design.  Trade-offs  should  be  performed  at  all  necessary  stages  of  the  development  process. 

Hence,  the  ability  to  capture  system  resources  at  various  levels  of  abstraction  and  in  a  systematic 
and  consistency  manner  is  very  important  for  system  engineers.  The  capture  of  system  resources 
is  not  only  used  to  search  for  an  optimal  design,  but  is  also  used  to  document  the  rationale  of 
design  selections  which  will  be  very  important  when  requirements  are  changed,  added,  or  when 
new  technologies  become  available. 

Although  existing  analysis  tools  (i.e.,  simulation,  optimization)  provide  different  mechanisms 
to  specify  resources,  the  representation  is  very  specific  to  the  type  of  analysis  that  it  supports. 

The  intention  of  this  resource  capture  approach  is  to  prowde  a  baseline  information  of  the  system 
resources  that  can  be  used  by  various  types  of  analysis.  This  will  minimize  the  number  of 
assumptions  that  were  generated  in  the  construction  of  analysis  models  and  guarantee  the 
consistency  of  the  analysis  results.  Objectives  and  requirements  for  an  advanced  complex  system 
resource  modeling  methodology  and  an  object-oriented  approach  for  implementing  an  advanced  tool 
for  resource  capture  will  be  addressed  in  this  paper. 


2.  RESOURCE  CAPTURE  METHODOLOGY  OBJECTIVES  AND  REQUIREMENTS 

This  resource  capture  methodology  is  intended  to  support  the  characterization  of  complex 
computer-based  system  resources  across  the  broad  spectrum  of  resource  types  and  at  various 
levels  of  design  detail.  The  spectrum  of  resource  descriptions  needed  to  capture  the 
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implementation  of  a  complex  system  design  at  various  points  in  the  design  process  may  vary  from 
very  specific  to  generic,  from  abstract  to  detailed,  and  from  simple  to  complex. 

In  a  top^own  design  process,  the  design  implementation  progresses  over  time,  from  high 
levtf  abstractions  of  system  resources  to  increasing  levels  of  detailed  hardware,  software  and 
human  operator  representations.  Analysis  and  simulation  of  a  particular  design  option  may  be 
required  at  various  points  in  the  design  process  to  address  uade-off s  or  demonstrate  com^iance 
with  key  system  requirements.  The  resource  capture  methodology  must  therefore  provide  a  robust 
flexible  method  for  representing  resources  of  many  different  types  and  levels  of  abstraction. 

Figure  1  depicts  four  levels  of  abstraction,  from  very  highly  aggregated  resources  down  to  the 
resources  at  the  subcomponent  level.  (Four  levels  are  picked  for  illustration  purposes  only;  more 
or  less  may  be  required  to  support  the  design  capture  and  analysis  needs  of  a  given  project.)  Early 
in  a  top-down  design  process,  the  system  resources  may  be  represented  as  large,  highly 
aggregated  resources  (e.g.,  subsystems)  and  the  system  under  design  is  represented  as  a 
combination  of  interconnected  complex  subsystems.  These  subsystem  resources  are  decomposed 
at  the  next  level  of  design  detail  and  are  represented  as  aggregations  of  hardware  and  software 
components.  At  the  next  level,  components  are  individually  represented,  including  software 
configuration  items  with  their  associated  mapping  to  tasks  in  real-time  system  execution  and 
hardware  units  with  their  associated  lowest  replaceable  units  (LRU's).  Typically,  the  systems 
engineering  activities  do  not  extend  below  this  level  of  detail;  however,  this  resource  capture 
methodology  should  support  linkage  with  hardware  and  software  engineering  methods  and  tools  to 
provide  support  for  the  entire  spectrum  of  the  system  development  environment. 


Highly  aggregated  resources 
-Combinations  of  subsystems 

Aggregated  HW/SW/Humanware 
resources 

-Combinations  of  components 

-HW  units  !  replaceable  items 
-CSCI/ Tasks 
-Indhriduai  operators 

-Piecepart  HW  devices 
-CSC  /  SW  Units 


Figure  1 .  Spectrum  of  Resource  Description 

The  resource  capture  methodology  must  also  provide  a  mechanism  for  managing  complexity 
in  the  characterization  of  resources  at  various  levels  of  design  detail.  One  or  more  sets  of  rules  will 
be  established  for  linking  resource  representations  at  a  higher  level  of  abstraction  to  resource 
representations  at  the  next  lower  level  of  design  detail.  This  capability  will  support  decomposing 
resource  descriptions  in  a  top  down  design  scenario  or  recomposing  them  in  a  re-engineering  or 
bottom-up  design  scenario.  When  the  transition  from  one  to  another  level  of  abstraction  is  not 
based  on  decomposition,  the  capability  will  provide  consistency  supports  for  the  transformation. 
Techniques  must  also  be  provided  to  manage  multiple  concurrent  or  alternate  resource  and  design 
options. 
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A  potentially  large  number  of  possible  resource  types  (including  hardware,  software,  and 
human  operator  categories)  must  be  captured.  A  mechanism  for  efficiently  representing  a  large 
number  of  resource  types  must  be  provided  as  well  as  a  flexible,  robust  approach  to  ponray  a 
diverse  spectrum  of  resource  characteristics  and  attributes.  This  capability  must  be  extensible  to 
include  new  resource  types  as  well  as  new  characteristics  of  existing  resource  types.  A 
mechanism  must  also  be  provided  for  organizing,  accessing  and  extracting  resource  descriptions. 
This  capability  will  e^stablish  a  library  for  the  captured  resource  descriptions  and  will  provide  an 
interface  to  analysis,  simulation,  optimization,  and  other  tools. 

Additional  requirements  for  the  resource  capture  methodology  include:  (1)  defining  a  formal 
linkage/mapping  with  the  other  design  capture  views,  (2)  establishing  common  formats  for  resource 
characterization  information  exchange,  and  (3)  providing  a  mechanism  for  system  requirements 
traceability.  The  requirements  described  in  this  section  establish  a  top  level  set  of  objectives  for  an 
advanced  resource  capture  methodology. 


3.  RESOURCE  MODEL  CONCEPT  AND  FEATURES 


Successful  employment  of  a  resource  capture  and  analysis  methodology  in  supporting  a 
complex  system  design  is  largely  a  function  of  the  degree  of  mechanization  which  can  be  achieved 
The  size  and  complexity  of  large  scale  systems  render  manual  application  of  any  detailed  resource 
capture  method  unusable.  Considerable  potential  benefits  can  be  gained  from  a  highly  automated 
design  capture  environment  supporting  a  disciplined  structured  capture  of  resource  descriptions 
which'can  be  employed  for  design  analysis,  simulation,  and  optimization  activities. 

Recent  advances  in  object  oriented  software  engineering  techniques  provide  a  powerful 
mechanism  for  implementing  a  tool  for  resource  capture  which  embodies  the  requirements  of  the 
previous  section.  Figure  2  illustrates  the  resource  model  implementation  concept  through  an 
Information  Model. 


Figure  2.  Information  Model  of  Resource  Capture  Concept 
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The  shaded  area  is  the  essential  information  that  will  be  captured  in  the  resource  capture 
tool  and  will  be  addressed  in  the  following  sections.  The  captured  system  resource  architectures 
are  used  to  evaluate  the  design  in  reference  to  different  criteria  which  are  dictated  by  or  derived 
from  the  requirements.  In  order  for  such  analyses  to  be  performed,  additional  information  must  be 
represented  and  linked  to  the  resource  model.  Other  objects  in  this  figure  are  examples  of  the 
required  information  that  will  be  used  in  conjunction  with  the  resource  model  for  the  evaluation  of 
the  design. 

3.1  Resource  Component  Type 

A  resource  component  type  is  a  class  structure  of  the  kind  of  resource  that  can  be  used  in 
the  system  design  process.  Each  resource  type  can  be  instantiated  to  a  number  of  specific 
resources  that  share  some  common  properties.  The  resource  component  type  is  characterized  by 
attributes  and  methods  and  is  organized  in  a  supertype/subtype*  manner  which  leverage  inheritance 
and  polymorphism  to  efficiently  represent  a  large  number  of  resources. 

Although  this  classification  concept  is  straightforward,  selecting  the  most  efficient 
classification  is  very  challenging.  The  selection  of  resources  can  be  motivated  by  its  functionality 
(i.e.,  operating  system,  beamforming,  detection),  by  its  character  (i.e.,  hardware,  software),  or  by 
its  domain  specific  (i.e.,  acoustic,  electromagnetic  environment).  Whether  a  particular  classification 
tree  of  resource  type  can  support  all  possible  selections  is  questionable.  The  multiple  inheritance 
concept  can  be  used  to  support  the  cross-reference  representation  of  such  structures;  however,  it 
will  complicate  the  tracing  capability.  Figure  3  shows  an  example  of  a  hardware  component  type 
that  vvas  selected  for  the  design  of  a  personal  computer. 


Figure  3.  Example  of  Resource  Component  Type 
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3.2  Resource  Component 

A  resource  component  is  an  instantiated  representation  of  a  resource  component  type, 
wMt  specific  values  for  attributes,  methods,  and  interfaces.  Hence,  a  collection  of  compatiUe 
resource  components  can  be  derived  from  an  individual  resource  component  type.  The 
representation  of  resource  component  can  be  very  generic  or  very  specific  depending  on  how  much 
of  its  attributes,  methods,  and  interfaces  are  indicated.  The  resource  component  structure  can  be 
complex  or  simple  depending  on  the  available  information.  Hence,  component  level  does  not  imply 
that  the  resource  is  at  its  lowest  level  of  decomposition. 

3.3  System/Subsystem 

A  collection  of  resource  components  can  be  connected  in  a  specific  manner  to  form  a 
system/subsystem.  Based  on  different  connecting  topologies,  alternate  designs  of  the  same 
system/subsystem  can  be  derived  from  a  single  collection  of  resource  components.  The 
connection  and  reconnection  of  system/subsystem  resource  structures  must  be  generated  rapidly 
and  effectively  such  that  multiple  evaluation  can  be  performed  to  achieve  an  optimal  design. 

Once  the  resource  components  are  aggregated  into  a  system/subsystem,  its  characteristic 
can  be  defined  through  a  set  of  attributes,  methods,  and  interfaces.  Since  system/subsystem 
characteristics  are  not  only  defined  by  the  combination  of  its  components  characteristic,  simulation 
or  actual  testing  can  be  used  to  derive  appropriated  system/subsystem  property. 

3.4  Resource  Library 

The  resource  library  is  a  repository  for  descriptions  of  instantiated  resource  components, 
subsystems,  and  systems.  It  also  is  a  place  holder  for  the  system  resource  information  that  wilt  be 
accessed  by  analysis  tools  to  evaluate  the  design.  Each  element  in  the  library  includes  an 
excessive  amount  of  information  and  has  multiple  related  versions.  Efficient  tracing  and  version 
controlling  techniques  must  be  incorporated  in  the  management  feature  of  the  resource  library. 


4.0  CONCLUSION 

The  resource  capture  tool  concept,  which  includes  a  graphic  interface  supporting  creation 
and  editing  of  resource  types,  instantiation  of  components,  interconnection  of  components  and 
subsystems,  and  organization  of  components  and  subsystems  into  libraries,  is  very  essential  in  the 
development  of  large-sized  and  complex  systems.  A  robust  and  flexible  mechanism  is  required  for 
characterizing  system  resources  including  hardware,  software,  and  human  operators.  This  resource 
characterization  approach  is  used  to  support  the  optimization  of  the  system  design  in  achieving 
competing  system  requirements  such  as  performance,  reliability,  cost,  and  security.  It  also 
provides  a  vehicle  for  the  selection  of  hardware,  software,  and  human  operation  as  well  as  the 
trade-off  between  alternate  hardware  architectures  and  software  partitioning  early  in  tite  design 
process. 
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1.0  Introduction 

The  production  of  high  quality  software  systems  (e.g.,  reliable,  maintainable)  is  regarded 
as  an  important  |oal  for  software  development  organizations.  Much  effort  has  been 
devoted  to  assessing  these  systems  based  on  quantitative  measures  of  (Quality.  Various 
metrics  have  been  proposed  to  represent  quality  as  well  as  the  characteristics  of  both  the 
software  product  and  the  system  development  process  contributing  to  quality.  It  may  not 
be  possible  to  directly  measure  quality  during  the  early  phases  of  development  However, 
the  product  and  process  characteristics  which  emerge  during  these  phases  can  be  used  as 
early  indicators  of  quality. 

It  has  been  recognized  in  recent  years  that  no  single  attribute  of  product  or  process  can 
adequately  explain  quality.  Many  features  of  product  and  process  play  a  role  in 
determining  quality.  The  challenge  is  to  account  for  their  impacts  on  quality  outcomes  in 
some  integrated  fashion. 

We  describe  a  metrics  integration  framework  which  is  capable  of  merging  both  product  and 
process  oriented  measures  to  arrive  at  a  characterization  of  software  quality.  The 
framework  is  extensible  in  terms  of  being  able  to  incorporate  new  metrics  as  they  become 
available.  In  addition,  it  can  accommodate  a  variety  of  development  contexts  (e.g., 
incremental  builds,  commercial  off  the  shelf  integration,  reengineering)  and  programming 
languages.  Finally,  the  framework  is  adaptable  in  the  sense  of  being  able  to  incorporate 
data  collected  by  different  software  analyzers. 

2.0  Framework  Overview 

The  framework  serves  as  the  basis  both  for  software  metrics  research  and  for  the 
application  of  the  research  results  to  assess  software  development  projects. 

The  major  elements  of  the  metrics  integration  framework  shown  in  Figure  1  involve  data 
collection  and  analysis,  model  building  and  validation,  and  application  to  new  projects  to 
preset  quality  outcomes  and  prescribe  system  improvements.  A  database  of  projects  is 
maintained  and  continually  updated  to  provide  a  foundation  for  ongoing  analyses  and 
assessments. 


Figure  1 :  Overview  of  Integration  Framework 
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Data  collection  and  analysis  serves  two  purposes.  First,  ’‘historical"  data  is  collected  from 
completed  projects  to  populate  a  project  ^tabase.  This  data  is  obtained  from  measurements 
of  software  code  through  the  use  of  software  analyzers  and  from  project  records  such  as 
problem  tracking  reports.  Additional  data  is  collected  on  the  software  development 
environment  The  building  of  the  models  and  their  validation  depends  on  this  historical 


Second,  to  make  predictions  for  projects  under  development  data  is  collected  during  the 
early  development  stages  from  software  artifacts  (such  as  designs).  This  data  serves  as 
input  to  the  models  piloting  software  quality,  for  example,  and  for  the  prescription  of 
software  improvements.  As  actual  outcome  data  becomes  available,  it  becomes  part  of  the 
"historical"  project  database.  This  data  may  be  used  for  model  validation  and  to  identify 
additional  enhancements  or  improvements  to  the  mocbls. 

The  models  thus  far  built  predict  the  quality  factors  of  reliability,  maintainability,  and 
flexibility,  as  well  as  lines  of  code  estimation  ba  1  on  early  design  features  [1,2,3,6.7]. 
The  analytical  techniques  thus  far  employed  to  calibrate  the  models  include  multivariate 
regression  analysis  and  ordered  response  approaches.  Also,  proportional  hazards  models 
have  been  iden^ed  for  the  analysis  of  time  tetween  failures  data.  In  addition  to  predicting 
development  outcomes,  the  models  can  also  be  used  prescriptively  to  examine  tradeoffs 
among  various  software  artifact  characteristics. 

3.0  Data  Collection  and  Analysis 

The  lack  of  reliable  and  appropriate  data  has  been  a  major  impediment  in  the  development 
of  softwtTO  metrics  and  their  use  in  system  development  assessment.  A  systematic 
approach  is  required  to  collect  consistent  and  accurate  data  across  development  projects. 

Details  of  a  mechanism  for  data  collection  and  analysis  are  shown  in  Figure  2.  The 
development  of  a  system  is  conducted  within  the  framework  of  a  development 
organization.  Three  types  of  data  can  be  collected:  software  artifact  data,  software  project 
data,  and  develq)ment  environment  data. 


Figure  2:  Data  Collection  and  Analysis 
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3.1  Software  Artifact  Data 


Software  artifact  data  is  available  at  diffoent  stages  of  the  development  process.  The 
artifact  may  software  code,  for  example.  Or  the  artifact  may  be  a  software  design 
reedfied  eithd*  by  means  of  a  design  language,  ot  through  smne  Computer  Aided  Software 
Engineering  (CASE)  tool  which  provides  a  representational  capability  for  design 
architecture.  As  will  be  (hscussed  in  further  detail  below,  this  raw  data  is  M  into  one  or 
more  software  anali^ers  which  extract  artifact  features  such  as  module  size,  declaration 
counts,  and  control  flow  information.  Many  of  these  features  are  related  to  the  complexity 
characteristics  of  the  software  artifacts,  ar^  are  used  in  the  derivation  of  c(»nplexity 
metrics. 

3.2  Software  Project  Data 

The  second  wpe  of  data  is  that  which  generally  cannot  be  collected  from  the  analysis  of 
software  artifacts.  During  the  process  of  software  development,  data  may  be  collected  on 
problems  uncovoed  during  testing,  die  efftat  to  isolate  or  fix  software  faults,  and  the  effort 
required  to  enhance  the  software.  Such  data  is  increasingly  available  in  elecnonic  form, 
and  its  extraction  and  analysis  is  amenrdile  to  autmnation. 

Project  data  of  this  sort  provides  information  on  software  development  outcomes  and  is 
used  to  derive  quality  metrics  associamd  with  the  software.  For  example,  problem  tracking 
reports  collect^  during  testing  can  be  used  to  identify  fault-prone  modules  or  subsystems. 
Similarly,  effort  data  associated  with  maintenance  activities  may  help  to  identify  excessively 
complex  portions  of  the  system  which  are  difficult  to  maintain. 

3.3  Development  Environment  Data 

The  last  type  of  data  characterizes  the  software  development  organization  in  which  the 
effort  takes  place.  This  data  provides  an  indicaticm  of  the  organization's  contribution  to  the 
cmnplexity  of  the  development  effort 

For  example,  an  environment  in  which  many  requirements  changes  and  additions  are  made 
adds  to  the  complexi^  of  the  development  effort  These  changes  lead  to  unplanned 
adjustments  and  modifications  to  the  design  or  implementation.  The  details  of  these 
chimges  and  their  impacts  may  not  be  well  communicated  within  the  organization,  resulting 
in  faidt  propagation  throughout  the  design  or  implementation. 

As  another  example,  a  reuse  policy  in  a  development  organization  which  encourages  the 
use  of  previously  developed  software  artifacts  in  new  projects  and  provides  component 
libraries  for  its  accomplishment  will  r»duce  die  complexity  of  its  development  efforts. 

The  quality  of  the  software  development  staff  as  measured  by  experience  or  educational 
levels  and  the  tumovo:  rates  of  projea  personnel  are  additional  characteristics  which  can 
influence  the  complexity  of  the  «velopment  effort. 

The  level  of  integration  of  technology  (such  as  CASE  tools)  and  the  use  of  rigorous 
development  methodologies  (such  as  the  clean  room  approach)  may  also  influence  the 
development  effort's  complexity. 

A  mature  development  environment  as  measured  by  an  organization's  ability  to  treat  the 
software  development  process  as  a  measurable  and  controllable  activity  may  aJso  affect  the 
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complexity  of  the  development  effort  The  Software  Engineering  Institute  (SEl)  has 
develtq)^  a  process  maturity  frameworic  for  the  ranking  of  software  development 
organizations  according  to  process  maturity. 

3.4  Data  Analysis 

Each  of  the  three  classes  of  raw  data  discussed  above  must  be  processed  further  to  provide 
appropriate  metrics  for  the  modeling  effort 

Software  analyzers  reduce  the  detail  of  software  artifacts  to  a  manageable  set  of  measures. 
Either  source  code  or  program  design  language  code  is  parsed  and  then  input  to  a  metrics 
algoridim  processor  ftn*  fturther  analysis.  Tlw  types  of  analyses  include: 

•  Dependency  analyses  to  determine  various  kinds  of  intermodule  dependencies 

•  Complexity  analyses  to  examine  ctmtrol  flow  and  functional  (tecomposition 
complexity 

•  Call  graph  analyses  to  determine  how  parts  of  a  system  use  or  are  used  by  other 
parts 

•  Interfsoe  analyses  to  examine  the  passing  of  variables  between  different  pans  of 
the  program 

•  Crxtss-reference  analyses  to  repmt  on  the  usage  of  symbolic  names 

•  Standards  checking  to  examire  if  ^teciflc  project  standards  have  been  followed 

Software  analyzer  outputs  are  not  always  in  a  form  appit^ate  for  building  models  and 
using  them  for  prediction  purposes.  The  data  must  be  expressed  at  appropriate  levels  of 
software  aggregation  (e.g.,  modules,  subsystems,  configuration  items)  to  facilitate  the 
analytical  and  evaluation  objectives.  One  can  imagine  having  to  address  questions  about 
software  and  its  characteristics  at  both  detailed  and  aggregate  levels.  For  example,  how 
much  testing  effort  should  be  allocated  to  a  configuration  item  or  subsystem?  Or,  given  a 
subsystem,  which  of  its  modules  ate  expected  to  be  mote  defect-prone? 

To  be  used  as  input  for  analytical  models,  the  software  characteristics  must  be  represented 
by  summary  statistics.  For  example,  a  source  lines  of  code  count  may  represent  the  size  or 
magnitude  of  some  level  of  software  aggregation  such  as  a  subsystem.  Similarly,  the 
number  of  calls  by  a  subprogram  to  other  subprograms  could  represent  a  measure  of  its 
coupling  complexity. 

As  widi  software  artifact  data,  the  project  and  development  environment  data  must  also  be 
repressed  at  appropriate  levels  of  aggregation.  For  example,  software  fault  data  collected 
ftom  problem  tracing  reports  may  be  rolled  up  to  a  subs^tem  or  configuration  item  level. 
Similarly,  the  number  of  software  changes  not  related  to  fault  correction  may  be  aggregated 
to  obtain  a  development  envirexunent  vohuiliiy  measure  at  die  project  level. 

Hybrid  measures  can  be  derived  ftom  the  three  categories  of  data  discussed  above.  For 
example,  a  fault  density  measure,  defined  as  the  numl^  of  faults  divided  by  die  number  of 
source  lines  of  code,  may  be  useftil  as  a  reliability  related  measure.  Or  the  staff  effon  per 
software  change  may  be  taken  as  a  measure  of  maintainability. 
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Tlus  additional  data  analysis  and  reduction  is  accomplished  by  means  of  a  set  of  utilities  as 
shown  in  Hgure  2.  The  software  artifact  data  generated  by  analyzers  along  with  project 
and  develc^mient  environment  data  are  input  to  these  utilities.  The  outputs  are  then  storod 
in  tte  project  Aua  base. 

If  new  metrics  are  developed  or  new  data  becomes  available,  these  utilities,  as  a  pan  of  the 
metrics  integration  irameworic,  may  be  modified  or  augmented. 

4.0  Model  Building  and  Validation 

System  development  can  be  viewed  in  econmnic  terms  as  a  production  process.  A  result  of 
tins  process  is  a  product,  namely,  a  system  of  certain  capabilities  and  characteristics  created 
by  the  qipUcation  of  resources.  The  characteristics  of  the  system  include,  for  example,  the 
size  and  complexity  of  tiie  associated  software,  and  quality  aspects  such  as  reliability  and 
maintainability.  The  resources  involve  labor  to  develop  the  system  and  capital  such  as 
hardware  resources  to  facilitate  the  development.  These  resources  are  brought  together  by 
technologies  consisting  of  software  engineering  methodologies  and  tools  applied  during  the 
development  effort 

The  project  database,  consisting  of  historical  data  on  system  and  development  environment 
characteristics  and  development  outcomes,  is  the  foundation  for  model  building  and 
validation.  This  data  is  used  to  calibrate  multivariate  statistical  models.  These  models, 
once  calibrated,  are  validated  on  the  basis  of  additional  project  data  not  used  in  their 
calibration.  This  validation  process  provides  a  test  of  the  assumption  that  the  models  are 
applicable  to  a  range  of  projects  and  environments  different  from  the  ones  used  for 
cidibrsuion. 

The  kinds  of  models  which  are  built  are  limited  by  the  availability  of  appropriate  data. 
Typically,  an  outcome  variable  characterizes  product  or  process  attributes  that  might  be 
desirable  to  predict  or  control.  There  may  be  a  need  for  the  early  identification  of  an 
outcome  imjmrtant  for  the  management  of  a  system  development  project.  For  example,  a 
manager  might  want  to  determine  the  total  effort  required  to  develop  a  system  early  in  the 
development  life  cycle.  Therefore,  the  outcome  variable  might  be  the  s^f  years  of  effort 
requii^  to  complete  the  project.  To  aid  in  estimating  the  effort,  a  model  may  be  built  to 
relate  effort  to  the  estimated  source  lines  of  co^  to  be  built  as  well  as  other  variables  such 
as  the  complexity  of  the  proposed  system.  Effort  models  of  this  sort  have  been  well 
documented  in  the  literature  [4]. 

Our  focus  is  on  the  prediction  of  software  development  outcomes  related  to  quality- 
namely,  reliability,  maintainability,  and  flexibility~and  on  the  prediction  of  lines  of  cckle 
based  on  software  artifaas  as  they  develop  through  the  life  cycle. 

4.1  Types  of  Models 

Several  major  categories  of  analytical  models  have  been  identified.  The  models  allow  us  to 
conduct  multivariate  analyses  relating  system  and  development  environment  characteristics 
to  develt^ment  outcomes.  They  differ  with  regard  to  the  level  of  data  aggregation  and  in 
tnms  of  tile  development  outcome  under  consideration. 

4.1.1  Multivariate  Linear  Regression  Models 

Models  of  this  kind  are  appropriate  when  the  development  outcome  variable  is  either 
continuous  in  nature  ot  can  be  approximated  by  a  ccmtinuum.  For  example,  defect  density 
(defects  divided  by  lines  of  code)  is  continuous.  On  the  other  hand,  subsystem  defects  is 
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discreie.  However,  at  the  sut»ystein  level,  the  number  of  defects  tends  to  be  large  and  may 
be  approximated  as  a  continuous  variable.  At  the  module  level,  such  an  approximadcm 
would  not  be  apprt^riate  and  the  ordered  response  models  described  in  the  next  section 
would  be  used. 

Multivariate  linear  regression  models  are  expressed  as  linear  combinations  of  parameters 
which  must  bb  calibrated.  Let  Y  be  the  outcome  variable  and  Xi,  X2,...,Xin  be  the  system 
and  devek^ent  environment  characteristic  variables.  The  variable  Y  is  a  randmn  variable 
whose  expected  value  is  estimated  by  some  function  of  the  explanatcny  variables  X], 
X2,...^in-  We  have  used  two  forms  of  regression  models:  those  linear  in  the  variables  and 
th^  iQgaritlunic  in  the  variables.  In  eithi^  case,  the  equations  are  linear  in  the  parameters. 
They  are  expressed,  respectively,  as: 

Y  »  a©  +  ai*Xi  +  a2*X2  + ...  +  ain*Xin  +  e  (1) 

ln(Y)  =  a©  +  ai*ln(Xi)  +  a2*ln(X2)  + ...  +  ain*ln(Xin)  +  e  (2) 

where  a©,  ai,...,am  are  the  parameters  to  be  estimated,  and  e  is  a  residual  term  accounting 
for  the  discrepancy  between  the  actual  value  of  Y  and  its  value  as  estimated  by  the 
remaining  terms  on  the  right  hand  side. 

To  estimate  the  parameters,  assumptions  must  be  made  about  the  probability  distribution  of 

Y  and,  hence,  of  e.  The  probability  distribution  of  e  is  usually  assumed  to  be  normal, 
centered  about  the  origin  with  standard  deviadcm  o.  A  normal  e  in  equation  (2)  implies  that 

Y  is  log-normally  distributed,  assuring  that  neither  Y  nor  e  is  ever  negative.  For  either 
equation  (1)  or  (2),  a  least  squares  approach  can  be  used  to  estimate  the  parameters. 

Logarithmic  forms  as  shown  in  equation  (2)  are  useful  in  representing  non-linear 
relationships  between  the  dependent  and  independent  variables.  The  logarithmic  form  also 
enforces  positivity  for  the  dependent  variable  if  all  the  terms  on  the  right-hand  side  are  real. 

Examples  of  multivariate  linear  regression  analyses  are  given  in  [6]  for  subsystem  level 
defects  and  in  [3]  for  subsystem  level  defect  densities. 

4.1.2  Ordered  Response  Models 

The  discreteness  of  defect  measures  becomes  apparent  when  analyses  are  conducted  at  the 
module  level.  The  number  of  defects  tends  to  be  small  and  many  modules  may  have  no 
defects  at  all.  The  discreteness  and  the  skewness  of  the  defect  distribution  toward  zero 
invalidates  the  normality  assumptions  associated  with  the  least  squares  approaches 
described  above. 

The  number  of  defects  can  be  viewed  as  categcnical  data.  A  module  is  classified  as  having 
0,  1,  2,  ...  ,  n,  or  >n  defects.  Thus,  the  dependent  variable  is  a  discrete  categorical 
variable  with  n+2  categories.  In  this  case,  ordered  response  models,  as  discussed  by 
Gurland,  et  al.  [8],  can  be  used. 

We  hypothesize  that  the  number  of  defects  associated  with  a  module  is  relate  to  a  single 
measure  characterizing  its  complexity.  This  complexity  measure  is  a  function  of  both  the 
system  and  development  environment  characteristics  which  are  assumed  to  be  of 
logarithmic  form: 
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ln(C*)  »  ao  +  ai*ln(Xi)  +  a2*ln(X2)  + ...  +  ain*ln(Xin)  +  e 


(3) 


where  e  is  a  ncnnally  distributed  residual  teiin.  Tlie  logarithmic  fcxm  assures  that  the 
complexity  is  a  positive  quantity.  The  residual  tenn  in  (3)  incoipOTates  all  those 
chancterisdc^  not  taken  into  account  in  Xi,  X2. ... » Xm- 

The  composite  complexity  measure,  C*.  is  not  observed  directly.  However,  the  number 
d  defects  is  related  to  complexity,  and  it  is  these  defects  that  are  observed.  The  number  of 
defects  changes  as  this  measure  crosses  different  thresholds.  Expressed  mathematically, 

bi-i  <  ln(C*)  £  bi  ^  (i-1)  defects  if  i=l,2,...ji+l  (4) 

=>  >n  defects  if  isn't-2 

where  the  bi  represent  the  thresholds  with  bo  » and  bn+2  =  and  bi  <  bi+i  for  i=0. 
...,  n+l. 

Substituting  equation  (3)  into  equation  (4)  and  rearranging  provides  constraints  on  the 
residual,  e: 

bi-i  +  A*ln(X)  <  e  ^  bi  +  A*ln(X)  =>  (i-1)  defects  if  i=l,2,...,n+l  (5) 

=*  >n  defects  if  i=n+2 

where  die  parameters  and  the  variables  are  expressed  as  vectors,  A  (So  ai  a2  ..  am)  and 
XT  =  (Xi  X2 ...  Xm)  where  X^  is  the  transpose  of  a  column  vector,  X 

Given  a  probability  density  function  (PDF)  for  the  residual,  the  probabilities  of  0,1,..., n, 
>n  defects  in  a  library  unit  can  be  calculated.  The  PDF  is  arbitrary  relative  to  sc^e  and 
translation  transformations,  and  may  be  chosen  with  unit  standard  deviation  centered  at  the 
origin  It  is  denoted  by  the  function  PDF(u)  with  corresponding  cumulative  distribution 
functitm,  Q>F(u). 

Fdr  a  residual,  e,  with  a  normally  distributed  probability  distribution  function  (PDF),  the 
cumulative  distribution  frequency  (CDF)  is  the  error  function  denoted  by  erf(u).  From 

equation  (5),  the  probability  of  zero  defects  is  the  integral  of  the  PDF  from  -<»  to 
bi+A*ln(X).  Similarly,  the  probability  of  one  defect  is  the  integral  from  bi+A*ln(X)  and 
b2+A'''ln(X),  and  so  forth. 

Thus,  the  ddect  probabilities  can  be  written  as: 

Prob(i  defects)  =  erf(bi+i+A*ln(X))  -  eif(bi+A*ln(X))  (6) 

for  i=0, 1, 2, ... ,  n+1  (where  i=n+l  refers  to  >n  defects). 

Note  that  the  probabilities  given  in  (6)  sum  to  unity.  The  expected  number  .of  defects  is 
given  by: 
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(7) 


n 

E(defecis)»  Ii*Pr(i)  +  E(>n  dcfccts)*Pr(n+l) 
i>0 

li^iere  E(>n  defects)  is  the  mean  numbo'  of  defects  for  those  library  units  with  more  than 
n^l  deft^ 

/ 

An  example  of  the  i^lication  of  diis  ai^noach  is  ^ven  in  [6]  for  defect  estimation  models 
and  in  [2]  for  maintainability  models. 

4.1.3  Proportional  Hazard  Models 

The  models  discussed  in  the  i»evious  two  subsections  are  appropriate  fo'  the  estimation  of 
software  ttefects  or  faults.  In  this  section,  we  focus  on  reliability  as  measured  by  the  time 
between  failures.  Defects  or  faults  are  developer  oriented  measures,  while  failures  are 
customer  miented.  The  developer  wants  to  know  how  many  defects  must  be  ccnrected. 
The  customer  wants  to  know  how  long  he  mi^t  be  able  to  use  the  system  with  failure-free 
operation.  A  failure  is  the  result  of  a  fault  encountered  during  system  use. 

The  purpose  of  testing  is  to  identify  failures  leading  to  the  isolation  and  fixing  of  faults. 
Also,  during  field  operation,  system  execution  may  lead  to  more  failures  and  result  in  the 
repair  of  atWtional  faults.  As  faults  continue  to  be  repaired,  and  the  system  has  fewer 
remaining  faults,  we  expect  higher  reliability  as  measured  by  failure-free  operation  over 
Icxiger  and  longer  periods  of  tirr 

The  hazard  function  (also  called  the  failure  rate  or  force  of  mortality)  for  a  system  of 
characteristics  described  by  vector  X  and  cumulative  execution  time  Te  for  the  last  failure 
occurrence  is  defined  by 

hrt  lY  T  t  IX, Te) 

h(t®,Te)-i.F(tlX,Tc) 

where  f(t  IX,Te)  is  the  probability  distribution  of  t,  the  time  to  the  next  failure,  and 
F(tlX,Te)  is  Ae  corresponding  cumulative  distribution  function.  The  characteristics,  X, 
may  vary  with  execution  time  Te. 

The  hazard  function  represents  the  conditional  probability  rate  of  failure  given  that  the 
system  has  survived  up  to  Te+t.  Thus  h(t  IX,Te)'''At  is  interpreted  as  the  conditional 
probability  of  failure  in  [Tc+t,  Te+t+At]  given  that  the  system  has  not  failed  in  [Te,  Te+t]. 

Reliability  R(t  IX,Te)  is  defined  as  the  probability  of  failure-free  operation  in  the  open 
interval  [Te«Te+t).  It  can  be  expressed  in  terms  of  the  cumulative  distribution  function  as 

R(tlX,Te)  =  l-F(tlX,Te)  (9) 

Recogniring  that  the  probability  distribution  function  is  the  derivative  of  the  cumulative 
distribution  function,  substituting  (9)  into  (8)  and  integrating,  we  can  express  the  reliability 
in  terms  of  the  hazard  function  as 

R(t  IX,Tc)  =  cxp{-  h(s  IX,Tc)ds)  (10) 
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Aiu^ier  measure  of  reliability,  the  mean  time  between  failures  (MTBF),  can  be  expressed 
as 


MTBF(X,Tc)  =  J  r  R(t  IX.Tc)dt  (1 1) 

Te 

Li  order  to  simplify  statistical  estimatkm,  Cox  [5]  proposed  a  separabfe  fonn  for  the  hazard 
fimctioii  (8)  given  by 

h(t  IX.Tc)  -  ho(t)  exp(XP  +  ag(Te))  (12) 

where  vector  ^  and  scalar  a  are  parameters  to  be  estimated,  gHe)  is  some  function  of  the 
executitm  dme,  and  ho(t)  is  the  baseline  hazard  function. 

Prentice,  et  al.  [9]  suggest  a  variation  of  equation  (12)  whereby  the  hazard  function 
depends  on  T^Te+t  and  can  be  written  as 

h(TIX)*ho(T)exp(XP)  (13) 


In  order  to  estimate  P  and  ho(t)  from  empirical  data  on  failure  rates,  we  could  attempt  to 

maximize  the  likelihood  function  for  the  observed  data  simultaneously  for  oc,  P  and  ho(t)  in 
equation  (12).  But  based  on  the  separable  form  for  the  hazard  function,  Cox  proposed  an 
approach  for  survival  data  [5]  which  was  extended  by  Prentice,  et  al.  [9]  to  repairable 
systems.  A  partial  likelihood  technique  was  identified  whereby  the  likelihood  function  for 

the  estimation  for  a  and  P  in  (12)  does  not  depend  on  ho(t).  Once  P  has  been  estimated, 
ho(t)  can  be  estimated  in  nonparametric  form  through  another  likelihood  function. 
Prentice,  et  al.  [9]  proposed  a  similar  solution  ftn*  hazard  functions  of  the  form  (13). 

The  approach  outlined  above  is  powerful  in  that  it  makes  no  assumptions  about  the 
pidbab^fy  distribution  of  the  time  tetween  failures  in  (12)  or  the  cumulative  execution  time 
to  failure  in  (13).  It  is  a  semi>nonparametric  approach  in  that  the  reliability  estimates 
depend  parametrically  on  the  characteristics  but  nonparametrically  on  the  time  between 
Bulures  m  (12)  or  tire  cumulative  execution  time  to  failure  in  (13). 

5.0  Development  Outcome  Prediction 

M^ls  of  the  types  described  above  can  be  ured  for  predictive  purposes  on  new  projects 
once  they  have  b^n  calibrated  on  the  basis  of  previous  jnoject  data.  The  new  proj^ts  are 
at  a  developmental  stage  where  outcome  data  is  not  available,  but  sonware  and 
develtqrmental  characteristic  data,  X,  are.  For  example,  the  complexity  characteristics  of 
software  begin  to  emerge  during  Ae  design  stage  as  the  system  architecture  is  developed. 
A  model  tiutt  relates  these  ctxnplexity  characteristics  to  quality  outcomes  such  as  defects  can 
be  used  tt>  predict  the  cumulative  defects  at  the  end  of  the  testing  stage.  However,  care 
must  be  talren  when  applying  the  model  fra*  pre^ctive  purposes.  Not  all  of  the  variables 
that  could  determine  outcomes  are  incorporated  in  the  model.  For  example,  the  quality  of 
the  development  staff  may  have  been  excluded  as  an  explanatory  variable.-  If  the  staff 
quahty  for  the  projects  used  to  calibrate  the  mocfel  were  substantially  higher  than  tirat  for  the 
project  to  be  p^cted,  we  might  expect  the  defect  predictions  to  be  un&restimated. 
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The  lesson  to  be  learned  from  the  above  is  that  models  for  prediction  should  not  be  applied 
in  a  casual  fashion.  The  user  must  have  a  strong  understanding  of: 

•  the  basis  on  which  a  model  was  derived 

•  die  range  of  variation  of  values  of  die  explanatory  variables 

•  die  potential  explanatory  variables  excluded  from  the  model 

•  the  potential  impact  of  ^ese  excluded  variables. 

A  large  dose  of  judgment  may  sometimes  be  required  to  use  a  predictive  model 
intelligendy.  One  may  use  a  model  to  predict  values  for  sane  outcone  variable  of  interest. 
Or  the  model  may  be  used  to  rank  order  a  coltection  of  modules,  for  example,  on  the  basis 
of  their  defects. 

The  latter  approach  could  be  less  stringent  regarding  the  assumptions  on  which  the  model 
is  based.  Suppose  a  model  of  the  kind  expressed  by  equation  (2),  for  example,  were  used 
to  predict  defect  density.  Calibrating  the  moctel  on  the  basis  of  a  set  of  projects  developed 
with  relatively  high  quality  staff  would  result  in  specific  values  of  the  parameters  ao,  ai, 
am.  If  a  second  set  of  projects  involving  low  quality  staff  were  pooled  with  the  first  set 
and  tile  model  were  recalibrated,  we  might  have  a  model  of  the  son 

ln(Y)  =  ao+bo*Ds  +  ai*ln(Xi)  +  a2*ln(X2)  +...  +  am*ln(Xm)  +  e  (14) 

where  Ds  is  a  dummy  variable  equal  to  zero  for  the  projects  with  high  staff  quality  and 
unity  for  the  projects  with  low  staff  quality.  Thus,  the  impact  of  staff  quality  is  in  the 
difference  of  the  constant  terms  for  the  two  types  of  projects.  High  staff  quality  projects 
have  a  constant  term  ao  while  the  constant  tenn  for  projects  with  low  staff  quality  is  ao+^. 
A  positive  bo  would  yield  a  higher  defect  density  for  the  projects  with  lower  st^f  quality. 
A  model  of  the  form  (14)  is,  in  principle,  empirically  testable,  as  is  the  hypothesized  sign 
for  bo. 

Suppose  staff  quality  enters  as  a  dummy  variable  as  in  (14)  or  as  an  additional  additive  term 
am+i'''ln(Xm.i-l).  If  these  terms  were  ignored  and  a  model  of  the  form  (2)  were  calibrated 
only  on  the  basis  of  projects  with  high  quality  staff,  then  it  would  be  possible  only  to  rank- 
ore^  subsystems  of  new  projects,  for  example,  according  to  their  relative  defect  densities. 
To  provide  information  about  the  actual  defect  densities  would  require  knowledge  about  bo 
ot”  *m+l* 

While  the  initially  calibrated  model  may  not  provide  information  about  either  bo  or  am+i, 
there  are  contexts  in  which  this  information  can  be  obtained  through  analytical  means.  For 
example,  if  a  project  is  developed  as  a  series  of  incremental  builds,  information  about 
defects,  for  example,  associated  with  build  N,  can  be  used  to  provide  an  estimate  of  the 
parameter,  bo,  and  the  resulting  recalibrated  model  can  be  used  to  predict  defects  for 
subsequent  builds  N+r  where  r>0. 

6.0  Software  Improvement  Prescription 

The  previous  section  discussed  the  use  of  models  to  predict  development  outcomes.  In  this 
section,  we  focus  on  the  use  of  statistical  models  to  prescribe  changes  for  improving 
^stem  quality. 

In  Ada  systems,  context  coupling  of  library  unit  aggregations  (LUAs)  is  a  contributor  to 
software  complexity  and,  hence,  to  defect  densities  [1].  A  LUA  consists  of  a  specification. 
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a  body,  and  any  related  subunits.  Context  coupling  of  LUAs  is  achieved  by  the  use  of  a 
"with”  clause  and  allows  for  the  exportation  and  importation  of  visible  declarations  among 
LUAs.  A  LUA  may  use  any  of  the  imported  declarations  as  resources  in  its 
implementation. 

The  relaticmship  between  ccmtext  coupling  and  defect  density  was  found  to  be  [3] 
ln(Defect  Density)  =  .53  +  .45*ln(Context  Coupling)  (15) 

with  a  coefficient  of  determination,  R^,  equal  to  .63. 

Now  consider  a  design  with  a  context  coupling  profile  as  shown  by  the  solid  line  in  Figure 
3.  This  hgure  represents  the  percent  of  LUAs  having  context  coupling  greater  than  a 
certain  value.  For  example,  about  twenty-seven  percent  of  the  LUAs  have  a  context 
coupling  greater  than  fifteen  and  about  seven  percent  of  the  LUAs  have  a  context  coupling 
of  forty-Hve  or  more.  The  mean  context  coupling  of  this  design  is  10.2  "withs"  per  library 
unit  aggregation.  Using  equation  (15),  the  system  has  a  predicted  defect  density  of  4.81 
defects  per  thousand  lines  of  source  co^.  A  one  million  source  line  of  code  system  would 
contain  4810  defects. 

A  redesign  of  the  system  might  consider  those  LUAs  in  the  tail  end  of  the  profile.  These 
LUAs  tend  to  be  large  and  usually  require  extensive  resources  imported  from  other  LUAs 
through  context  coupling.  A  programmer  responsible  for  implementing  such  LUAs  is 
confronted  with  a  large  number  of  declarations,  including  those  defined  within  the  LUA  as 
well  as  those  which  are  imported  through  context  coupling.  The  resulting  complexity  may 
lead  to  larger  defect  densities  in  these  LUAs. 


Dividing  the  large  context  coupling  LUAs  (>45  "withs")  in  the  tail  of  Figure  3  into  smaller 
units  reduces  the  context  coupling  profile  to  that  shown  by  the  dashed  line.  The  redesigned 
system  has  an  average  coupling  of  7.1  "withs"  per  LUA.  The  predicted  defect  density  is 
then  reduced  to  4.08  defects  per  thousand  lines  of  source  code.  Assuming  that  the 
redesigned  system  has  approximately  one  million  source  lines  of  code,  the  predicted 
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number  of  defects  is  4080  for  a  net  saving  of  730  defects.  The  number  of  defects  is  thus 
reduced  by  about  fifteen  percent 

This  analysis  can  be  extended  to  estimate  the  cost  saving  of  the  redesign.  The  cost  saving 
depends  on  the  testing  cost  to  uncover  each  of  the  730  defects,  the  cost  of  isolating  the 
defect  to  a  specific  portion  of  code,  the  cost  of  fixing  the  defect,  and  the  cost  of  additional 
regression  testing. 

Another  way  of  looking  at  the  analysis  leading  to  equation  (IS),  which  does  not  depend  on 
the  constant  tenn,  is  to  take  the  differential  of  (IS)  yielding 


ADD  ACC 

ED  CC 


(16) 


where  DD  refers  to  defect  density  and  CC  to  context  coupling. 

This  equation  indicates  that  a  ten  percent  change  in  the  context  coupling  results  in  a  4.5 
percent  change  in  the  defect  density.  Reducing  context  coupling  from  10.2  to  7.1 
represents  about  a  thirty  percent  change.  This  leads  to  about  a  fifteen  percent  change  in 
(Meet  densiQr  consistent  with  the  an.-'lysis  above. 

The  utility  of  equation  (16)  is  that  knowledge  of  the  value  of  the  constant  term  in  equation 
(IS)  is  not  required.  Therefore,  it  may  have  greater  applicability  across  projects 
substantially  different  from  die  ones  used  to  estimate  equation  (IS). 

7.0  Conclusions 


We  have  presented  a  framework  for  the  development  of  software  metric  mcxlels  and  their 
applications  to  the  assessment  of  development  projects.  The  framework  is  robust  in  that  it 
can  incorporate  a  variety  of  software  metrics  and  can  accommtxlate  different  programming 
languages  and  development  contexts. 

We  have  discussed  several  statistical  analysis  techniques  which  span  different  levels  of 
system  aggregation  and  focus  on  different  outcome  variables.  These  techniques  provide 
the  basis  for  building  models  to  predict  development  outcomes  and  to  prescribe  system 
changes  that  may  improve  development  outcomes. 

This  framework  is  the  foundation  for  continuing  work  to  enhance  the  technology  for 
quantitative  system  evaluation  and  improvement 
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ABSTRACT 

The  puipose  of  this  paper  is  to  present  a  mul¬ 
tifaceted  iqtproach  to  the  measmement  and  evahation 
of  cranidex  Navy  system  designs  with  embethled  real¬ 
time  mission  critical  characteristics.  The  qipraadi  ad¬ 
vocates  die  use  of  a  visual  simulation  model,  con¬ 
structed  in  the  Visual  Simulation  Sqqxnt  Environment 
(VSSE),  representing  the  design  so  as  to  achieve  dy¬ 
namic  measurement  of  the  derign.  A  knowledge-based 
approach  is  prcqiosed  for  the  indepeiKtent  evaluation  of 
hundreds  of  design  indicators.  The  Objectives  /  Prin¬ 
ciples  /  Attributes  (OPA)  framework  is  used  as  the  un- 
deriying  structure  for  the  measurement  and  evaluation 
of  all  duee  major  aspects  of  a  system  design:  project, 
process,  and  product 

L  INTRODUCTION 

A  complex  Navy  system  is  composed  of  three  ma¬ 
jor  corrqionents:  software,  hardware,  and  'liomanware’'. 
These  compmients  are  intertwined  with  real-time  mis- 
rioo  critical  characteristics  and  pose  significant  tech¬ 
nical  challenges  for  system  designers,  developers,  and 
maintainers  independent  of  warfare  area. 

A  critically  important  phase  in  the  engineoing  of 
complmi  systems  methodcdogy  is  system  derign  meas¬ 
urement  and  evaluatimu  Before  making  the  commit¬ 
ment  to  qiend  millions  of  dollars  fen*  building  a  system, 
its  design  must  be  carefully  evaluated. 

The  purpose  of  this  paper  is  to  present  an  ^ 
proach  fra  measurement  and  evaluation  of  complex 
Navy  system  designs.  Section  2  describes  some  basic 


concqsts  of  measurement  and  the  lenninology.  The 
proposed  approadi  based  on  die  me  of  simulation  mod- 
ding,  indicators  (metrics),  and  the  OPA  frameworir  is 
described  in  Section  3.  After  concluding  ternaries  in 
Section  4,  a  bibliognqdiy  is  givrai. 

2.  THE  MEASUREMENT  SCHEME 

Nfeasuiement  is  an  inqiortant  activity  of  interest  in 
many  disc^lines.  However,  a  standard  terminoipgy 
does  not  exist.  Different  terms  are  used  in  different  dis¬ 
ciplines  to  convey  die  same  notion.  The  term ‘Tnettk” 
is  used  in  Software  Engineeiing  in  measutemertt  of  soft¬ 
ware  quality  duuacteristics  and  software  project,  pro¬ 
cess,  and  products.  The  terms  *'Mdmne"  Old  “Index” 
are  used  in  Computer  Performance  Evaluation  in  meas¬ 
urement  of  different  aspects  of  a  compdra  system.  The 
terms  “Scale”  and  “Factor”  ate  used  in  statistical  meas¬ 
urement  The  term  “Indicatra”  is  used  in  Ecraiomics 
fra  measurement  of  the  economy  (e.g.,  leading  econom¬ 
ic  indicators)  and  in  Psychometric  Theory  in  measure¬ 
ment  of  psychological  problems. 

The  common  goal  in  all  these  disciplines  is  to  try 
to  accurately  measure  a  concept  which  can  be  either 
quantitative  or  qualitative.  Measurement  of  qurui- 
titative  concerts  (e.g.,  teqxmse  time,  dnouglqiut  ittil- 
ization)  can  be  done  directly.  Whereas,  qualitative  con- 
cqits  (e.g.,  design  utility,  maintainalrility,  complexity) 
must  be  measured  indireedy  by  using  a  hietardiy  of  in¬ 
dicators  as  shown  in  Hgure  1. 

In  diis  paper,  we  use  the  term  “indicator”  and  de¬ 
fine  it  as  an  indirect  measure  of  a  qualitative  concept 
We  define  the  term  “metric”  as  an  indicatra  die  value  (rf 
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indicator  1.1 


Figure  L  Measurement  of  Qualitative  Concq)ls 


which  can  he  computed  by  using  a  fonnula.  A  lit¬ 
erature  survey  reveals  three  categories  of  indkatOR:  (1) 
jHOcess  indicators  (e.g..  standards,  derign  meth- 
oddogy),  (2)  product  indicators  (e.g..  execution  ef¬ 
ficiency,  reliability,  usability),  and  (3)  external  in¬ 
dicators  (e-g.,  security,  physical,  financial). 

A  qualitative  concept  (e.g.,  system  design  utility) 
is  measured  by  N  indicators  at  die  first  level  Those  in¬ 
dicators  which  cannot  be  direcdy  measured  are  furdier 
decomposed  into  other  indicattHS  at  the  second  level. 
This  decomposition  continues  until  the  indicators  at  the 
base  level  (Le.,  die  ones  that  are  not  decomposed  fur¬ 
ther)  are  direcdy  measuraUe.  Figure  2  presents  a  hier- 
aichy  of  indicators  for  measurement  of  system  design 
utility.  Note  that  the  decomposition  of  indicators  in 
Rgure  2  is  not  carried  comfdetely,  and  many  indicators 
riiown  at  die  base  level  are  not  directly  measureable. 

Measurement  of  a  concqM  using  indicators  carries 
someeiTon 

CONCEPT  ■  INDICATORS  +  ERROR 

Validation  and  verification  of  indicators  deal  with 
determining  the  amount  of  error  in  the  measurement 
process  and  require  collection  of  data  from  the  apfdica- 
tion  of  the  indicators  to  a  large  number  of  cases.  This 
data  collection  is  extremely  time  consuming  and  very 
esqiensive  in  many  cases. 


3.  THE  PROPOSED  APPROACH 

Kfeasurement  and  evaluation  of  a  complex  Navy 
system  design  iqxesented  in  a  static  manner  (on  papa) 
cannot  be  made  convincingly.  The  system  being  repre¬ 
sented  is  typically  embedded  within  a  laiga  system. 
Performance  parameters  are  carefully  defined,  de¬ 
pendent  on  an  equally  careful  assessment  of  mission  re¬ 
quirements.  Introducing  even  greata  challenge  is  the 
real-time  or  tune-critical  dimension  of  system  behavior. 
Therefore,  the  system  design  must  be  rquesenied  in  a 
dynamic  fashion  so  tiuU  its  measurement  and  evaluation 
can  oqiture  all  of  its  dynamic  characteristics.  We  pro¬ 
pose  to  rqaesent  a  complex  Navy  system  design  in  the 
form  (tf  a  visual  simulation  model  and  experiment  with 
it  for  the  purpose  of  measurement  and  evaluation  of  the 
system  design  as  illustrated  in  Figure  3.  Ftoweva,  we 
realize  that  the  development  a  visual  simulation  mod¬ 
el  of  a  complex  system  design  is  very  comitiex  and  dif¬ 
ficult  itself.  Therefore,  we  propose  to  use  the  Visual 
Simulation  Support  Environment  (sriiidi  can  also  be 
characterized  as  computa-aided  visual  simulation  s(^- 
ware  development  environment)  a  fully  functional  pro- 
totypc  of  which  has  been  constructed  at  Virginia  Tech. 
The  simulation  environment  is  briefly  described  in  Sec¬ 
tion  3.1. 

The  measurement  of  a  comi^ex  Navy  system 
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Syiton  De^gn  UtiliQr 
Fferiwmanoe 

RespomeTune 

Efikieii^ 

Device  efficiency 

Accessibility 

Conciseness 


Fiediclabiliiy 
ThnMud^t 
Time  Coinplejfity 
Reliability 

MTtF  •  Mean  Time  to  Failure. 

MTBF  -  Mean  Time  Between  Failures 


Accuracy 

Informal  Testine 
DeskChedtina 
Structured  WaUdfaroughs 
Inspections 
Reviews 
Audit 

Static  Testing 

Structural  Analysis 
Consistei^  Checking 
Dynamic  Testing 

Top-Down  Testing 
Bottom-Up  Testing 
Black-Box  Testing 
White-Box  Testing 
Stress  Testing 
Execution  Monitoring 
Execution  Profiling 
Regression  Testing 
Symbolic  Testing 
Path  Andysis 
Cause-EfiM  Griphing 
Constraint  Testing 
Assertion  Testing 
Boundary  Analysis 
Inductive  Assertions 
Fault  Tolerance 

Graceful  Degradation 
Redundancy 
Crash  Recoverability 
MTTR  -  Mean  Time  to  Repair 
Availability 

Computation  Heavy  Process  (Stress)  Effects 
Complexity 
Size 

Haktead’s  program  length 
nl 
n2 

Jensen's  program  length 
nl 
n2 

Function  Points 
McCabe's  cyclomatic  number 
Henty/Kafura  information  flow  metric 
IF4  information  flow  metric 
Belady's  bandwidth 
System  complexity 

Data  comidexity  (D) 

Structural  complexity  (S) 
Procedural  complexity 
Functional  complexity 


Whitworth  A  Szulewski  control  flow  complexity 
Whitworth  &  Szulewski  data  flow  comptexity 
MoQure's  Module  Invocation  Complexity 
Woodfield's  Review  Metric 
Woodward's  K 
Chen's  MIN 
Benyon-Tinker's  Cx 
Intensity  of  program  use 
IVogram  age 
Maintaiiubility 
Conectibility 
Extensibi^ 

Complexityt 
Adaptab^ty 
Information  hiding 
Coupl^ 

Cohesion 

Well-Defined  Interface 
Modularity 
Connectivity 

Functional  connectivity 
connectivity 

Stability 

Yau  &  Collofello  stablity  metric 
Requirements  Stability 

Cost 

Development  Cost 
COOOMO 
Sizet 

Complexityt 

SLIM 

ESTIMACS 
Testing  Cost 
Maintenance  Costs 
(Operation  Costs 
Productivity 

Source  lines  of  code  pa  work  month 
Dollars  expended  per  line  of  source  code 
Purchase  Costs 
Personnel  Costs 

Cost  of  implementing  security  barriers 
Initial  cost 

Analysis  of  security  requiremenu 
Implelementation 
Vsilidation  and  testing 
Operational  cost 

Extra  CPU  time  required  for  logging 
Enc^erment  time  coefficient 
Maintenance  of  audiorization  matrix 
Restriction  of  previously  offered  services 

Security 

Expected  time  to  break  passwotd(s) 

Size  of  the  authorization  matrix 
Dqith  of  the  authority  levels 
Quality  of  the  introduced  cryptogrqihy  system 
Usabihty 

User  intensity 

Required  number  of  operators 

Number  of  simultaneous  users 

Ease  of  use 

User  response  time 

Completeness 

Communicativeness 

Training 


t  See  the  decomposition  of  this  indicator  given  earlier  in  the  hierarchy. 

Figure  2.  A  Hierarchy  of  Some  Indicators  for  System  Design  Measurement 
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Uacrintoftoe 

f  Mnemonicc/ooiiiinandi 
Length  of  nmenonict  (commands) 

#  of  mnemonics  beginning  with  tame  duracter 
Reqxmaetime 
Avcnm  #  choices  in  menu 
Avg.  i^of  menus  in  one  pioce*^  sequence 
Good  or  bed  gnyh  representations 
Good  OTybod  error  messages 
Number  of  error  messages 
KastaUabiliQr 
iofDowtwday 
ReuseabOity 

bifotmation  Hidingt 
Hierarchial  Deconiqiotitioo 
Functional  Decomposition 
hfodulariQr 
Pmtability 
CompleuVt 


Testafailiqr 

Hieriirchial  Decomposition 
Punctiofisl 
Infbiraatioo  Hidingt 
Exhaustivi^  of  lest  data 
Modularity 

Adverse  testing  effects 
Complexi^ 
lerstatMabilin 
Number  of  1 
Amount  of  data  processed 
Information  hidingt 
KfodttlariQr 
Qniectness 


Requirenrents  Traceability 
Requirements  Definition 
Con^eteneas 
Coruistency 


t  See  the  dectmiposition  of  this  indicator  given  earlier  in  the  hierarchy. 

F^rel.  A  Hierarchy  Some  Indicators  for  System  Design  Measurement  (Continued) 


design  involves  indicators  that  can  be  assessed  only  by 
experts  with  intimate  knowledge  of  the  mission  areas 
for  which  die  system  is  intended.  Therefore,  we  pro¬ 
pose  to  use  a  knowledge-based  iqqpnoach  as  described  in 
Section  3.2. 

We  en^on  the  use  of  hundreds  of  indicators  for 
the  measurement  and  evaluation  of  a  complex  Navy 
system  design.  Scmie  of  these  indicators  can  be  meas¬ 
ured  by  system  engineers,  some  (especially  those  in¬ 
dicators  for  dynamic  system  characteristics)  can  be 
measured  by  using  the  visual  simulation  model  rqne* 
senting  the  system  design,  and  some  can  be  measured 
1^  only  the  warfare  domain  eiqietts. 

All  these  indkaUHS  need  to  be  sqiplied  under  a 
framework  in  order  for  the  measurement  to  be  effective. 
The  needed  framewok  is  the  Objectives/Ftindples/ 
Attributes  (OPA)  framework  developed  at  Virginia 
Tech  and  is  briefly  described  in  Section  33. 

3.1  The  Visual  Simulation  Support  Environment 

The  ever-increasing  complexity  Of  visual  simula¬ 
tion  modd  develqmient  is  undeniable.  A  simulation 
programming  language  supports  only  the  programming 
inocess— one  of  10  processes  in  the  life  cycle  of  a  sim¬ 
ulation  study  [Bald  1990].  Automated  support 
throughout  the  entire  visual  simulation  modd  develop¬ 
ment  life  cycle  is  crucially  needed.  This  support  can  be 


provided  in  the  form  of  an  environment  composed  of 
integrated  software  tools  providing  computer-aided 
assistance  in  the  development  and  execution  of  a  visual 
simulation  modd. 

A  collection  of  computer-based  tools  makes  up  a 
devdr^ent  environment  if,  and  only  if,  the  tools  ate 
highly  integrated  and  work  under  a  unifying  Conceptual 
Framework  (CF).  The  Simulation  Modd  Devdqpment 
EnvironiiKnt  (SMDE)  research  project  [Bald  1986; 
Bald  and  Nance  1987a,  1992]  at  Virginia  Tech  has 
recently  achieved  die  automation-based  software  par¬ 
adigm  [Bald  and  Nance  1987b]  and  devdoped:  (1)  the 
multifaceteD  cOncqitual  frahfewotk  for  visual  simula- 
tioN  mOdding  (DOMINO)  [Derrick  1992;  Derrick  and 
Bald  1992a],  (2)  the  Visual  Simulation  Support  En¬ 
vironment  (VSSE)  [Derrick  1992;  Derrick  and  Bald 
1992b],  and  (3)  the  Visual  Sbnulation  Modd  Spedfica- 
tion  Language  (VSMSL)  [Derrick  1992;  Derrid:  and 
Baki  1992c]. 

The  nqiid  proto^lting  technique  has  been  used  in 
the  VSSE’s  evolutionary  joint  development  with  the 
DOMINO.  Many  VSSE  tod  prototypes  have  been  de¬ 
veloped,  implemented,  experimented  widi.  and  doc¬ 
umented.  Some  prototypes  have  been  discarded;  how¬ 
ever,  the  experience  and  knowledge  gained  through 
ejqierimentation  with  those  prototypes  have  been  kept 

Rgure  4  depkts  the  VSSE  architecture  in  four 
layers:  (0)  Hardware  and  Operating  System,  (1)  Kemd 
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t 

Visual  Simulation  Support  Environment  (VSSE) 


Figure  3.  Simulation  Model-Based  Evaluation  of  Comi^ex  Navy  System  Designs  Using  Indicators 
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FlgiireS.  Visual SimobdooSqjixMEnviioiiiiiemAidute^^ 
VSSE.(2)MiiuinalVSSE,and(3)VSSEs.  3J2  iMyvl:  Kernel  VisHdl 


3J2  Layer  1:  Kernel  Visual  SimalaAon  Support 
Environment 


32  J  Layer  0:  Hardware  and  Operating  System 

A  Sun  compoler  woricstatkm  constitutes  the  hard- 
ware  of  the  VSSE.  Hie  UNIX  SunOS  operating  system 
and  utilities,  SunView  graphical  user  mterfooe,  and 
INGRES  relational  database  management  system  con- 
stitme  the  software  environment  upon  udiich  the  VSSE 
Is  built. 


Mmarfly,  this  layer  integrates  all  VSSE  tools  into 
the  software  environment  described  above.  It  provides 
INGRES  databases,  communication  and  lun-dme  siq>- 
poft  functions,  and  a  kernel  interface.  Three  INGRES 
databases  occupy  this  layer,  labeled  project,  piemodels, 
and  assistance,  each  administered  by  a  coneqxxiding 
manager  m  layer  2.  AU  VSSE  tools  are  required  to 
communicaie  duough  the  kernel  interface.  Direct  com- 


munkaiioo  between  two  ttxris  is  pcevenied  to  make  the 
VSSE  easy  to  maintain  and  expand.  Tbe  kernel  inier- 
CKe  provides  a  standard  communication  proiocd  and  a 
imifonn  set  ai  interface  definiiions.  Protection  is  im¬ 
posed  by  the  kernel  interface  to  prevent  any  un¬ 
authorized  use  of  to(^  or  data. 

3JJ  Layer  2:  MininuU  Visual  Simulation  Support 
Envb’onmetU 

This  layer  provides  a  “comprehensive’’  set  of  tools 
which  are  “minimal”  tor  the  development  and  execu¬ 
tion  of  a  visual  simulation  model  “Comprehensive” 
implies  that  the  toolset  is  su^xstive  of  all  model  de¬ 
velopment  phases,  processes,  and  credibility  assessmrot 
stages.  “Minimal”  implies  that  the  toolset  is  basic  and 
general  It  is  basic  in  the  sense  that  this  set  of  tools  en¬ 
ables  modelers  to  work  within  the  bounds  of  the  mini¬ 
mal  VSSE  without  significant  inconvenience.  Gener¬ 
ality  is  claimed  in  the  sense  that  the  toolset  is 
generically  qiplicable  to  various  simulation  modeling 
tasks. 

Minimal  VSSE  tools  are  classified  into  two  cat¬ 
egories.  The  first  category  contains  tools  specific  to 
simulation  modeling:  Rroject  Manager,  Premodels  Man¬ 
ager,  Assistance  Manager,  Model  Generator,  Model 
Analyzer,  Model  Translator,  Model  Verifier,  and  Visual 
Simulator.  The  second  category  tools  (also  called  as¬ 
sumed  tools  or  library  tools)  are  expected  to  be  pro¬ 
vided  by  the  software  environment  of  Layer  0:  Elec¬ 
tronic  Mail  System,  Document  Preparation  System,  and 
Text  Editor. 

SJ.4  Layer  3:  Visual  Simulation  Siyjport 
Environments 

This  is  the  highest  layer  of  die  environment,  ex¬ 
panding  on  a  defined  minimal  VSSE.  In  addition  to  die 
toolset  of  the  minimal  VSSE,  it  incorporates  tools  that 
support  qiecific  applications  and  are  needed  either 
within  a  particular  project  or  by  an  individual  modeler. 
If  tx)  other  tools  are  added  to  a  minimal  toolset,  a  mini¬ 
mal  VSSE  would  be  a  VSSE. 

The  VSSE  tools  at  layer  3  are  also  classified  into 
two  categories.  The  first  category  totds  include  those 
specific  to  a  particular  area  of  application.  These  tools 
might  require  further  customizing  tor  a  specific  ^ject. 


or  additional  loob  may  be  needed  to  meet  qiedal  re¬ 
quirements.  The  second  category  lotris  (also  called  as¬ 
sumed  tools  or  library  tools)  are  those  anticqiaied  as 
availaUe  due  to  use  in  several  other  areas  of  applica¬ 
tion:  analysis  of  simulation  ouqxit  Hum,  de- 

signing  simulation  experiments,  documentation  and 
oedilnlity  assessment,  and  itqiut  data  modding.  Some 
examples  (rf  such  tools  comprise  layer  3. 

A  VSSE  tod  at  layer  3  is  integrated  with  other 
VSSE  tods  and  with  the  software  roviromient  of  layer 
0  through  the  kernel  interface.  Tbe  provision  for  this 
integration  is  indicated  in  Hgure  4  by  the  opening 
between  Project  Manager  and  Text  Editor.  Anew  tod 
can  easily  be  added  to  the  todset  by  making  the  tod 
conform  to  the  communication  protocol  d  the  kernel 
interface. 

The  VSSE  was  developed  by  using  the  C  pro¬ 
gramming  language,  SunView  grsqihical  user  interface. 
Sun  programming  environment,  and  INGRES  relational 
database  management  system  Embedded  QUEry  Lan¬ 
guage/C  (EQUEL/C).  It  encompasses  more  than 
S0,0(X)  lines  of  documented  code  and  runs  on  a  Sun  col¬ 
or  workstation. 

Currently,  we  are  building  the  production  version 
of  the  VSSE  under  the  NeXTStep  object-oriented  dis¬ 
play  postscr^-bosed  Operating  System. 

Tbe  Knowledge-Based  Evaluation 

The  overall  evaluation  of  a  complex  Navy  system 
design  must  be  conducted  independently.  The  organi¬ 
zation  which  creates  the  system  design  is  not  qualified 
to  also  perform  its  final  overall  evaluation  because  of 
the  “developer’s  bias”.  We  envision  the  scenario 
illustrated  in  Figure  S  in  which  an  independent  or- 
ganizatkm  is  charged  with  the  task  of  measurement  and 
evaluation  of  the  system  design.  This  organization  can 
be  independent  to  the  qxmsoring  and  developing  or¬ 
ganizations  or  it  can  be  a  branch  within  the  ^xmsoting 
organization. 

Based  on  the  warfare  dcmiain  the  system  design  is 
intended  for,  hundreds  of  indicators  should  be  identified 
for  measurement  and  evaluation.  Some  of  these  in¬ 
dicators  can  be  measured  by  the  use  of  a  simulation 
model  rqxesenting  the  system  design,  some  can  be 
computed  by  using  a  formula,  and  some  need  to  be 


132 


Sponsoring  Organization  Organization  Designing  the  Navy  System 


Independent  Organization  Responsible  for  the 
Evaluation  of  the  Navy  System  Design 


Problem  Domain  Specific  Knowledge  About  the  Relationships, 
Indicators  Dependencies  Among  the  indicators 


Figure  5.  Knowledge-Based  Evaluation  of  Ccxnplex  Navy  System  Designs  Using  Indicators 
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assessed  by  expeit  people  who  have  intimate  knowl* 
edge  the  warfare  domain. 

The  assessment  of  indicators  requires  knowledge 
aibout  the  problem  domain  and  knowledge  abo«n  the  re¬ 
lationships  and  dq)endencies  among  the  indicators. 
These  types  knowledge  are  essential  for  scoring  on 
the  indicatois. 

Hras.  the  evaluation  process  using  hundreds  of  in- 
dkalQts  and  the  knowledge  base  becomes  a  very  com- 
idex  process  requiting  computer-aided  assistance.  Such 
assistance  is  being  provided  in  a  software  tool  called 
SENATE  which  is  under  develtqmient 

Figure  6  shows  the  SENATE  tool  browser  con¬ 
taining  the  hierarchy  of  indicators  presented  in  Figure 
2.  The  SENATE  allows  the  user  to  create,  modify,  de¬ 
lete.  weight,  and  score  on  the  indicators.  An  expert  sys¬ 
tem  shell  under  develcqmient  will  enable  the  system  ad¬ 
ministrators  to  store  rule-based  knowledge  into  the 
inference  engine  of  the  SENATE.  The  knowledge  is 
used  in  the  background  during  weighting  and  scoring 
on  the  indicators. 

The  OPA  Framework 

The  Objectives/Principles/Attributes  (OPA) 
firamewoik  [Arthur  and  Nance  1987, 1990;  Arthur  et  al. 
1993;  Nance  and  Arthur  1988]  characterizes  the  raision 
d'etre  for  software  engineering  and  establishes  de¬ 
finitive  linkages  among  project  level  objectives,  soft¬ 
ware  engineering  principles,  and  desirable  product  at¬ 
tributes  as  illustrated  in  Figure  7. 

The  OPA  framework  is  allied  in  the  wganization 
and  application  of  indicators  throughout  the  system  en¬ 
gineering  life  cycle.  It  is  essential  that  we  not  only 
measure  and  evaluate  the  system  design,  but  also  the 
process  by  which  the  design  is  created  under  project 
level  objectives.  The  OPA  framework  provides  a  com¬ 
prehensive  view  covering  inoject,  process,  and  product 
measurement  and  evaluation. 

4.  CONCLUDING  REMARKS 

The  Navy  systems  are  indeed  very  complex  and 
contain  three  diverse  components:  software,  hardware, 
and  “humanware’*.  These  components  are  intertwined 
with  real-time  mission  critical  characteristics  and  pose 


significant  technical  challenges  for  system  designers, 
developers,  and  maintainers  independent  of  warfare 
area.  A  credible  approach  to  the  assessment  of  such  a 
conqilex  system  must  include  least  the  ftdlowing  ele¬ 
ments:  (1)  a  simulation  model  built  to  rqxesent  the  sys¬ 
tem  design  so  that  dynamic  measurement  cm  be  done; 
(2)  identification  of  hundreds  of  qualitative  and  quan¬ 
titative  indicators  to  measure  software,  hardware,  and 
*1iumanware”  components  of  die  system  design;  (3)  a 
knowledge-based  system  that  provides  computer-aided 
assistance  in  the  evaluation  process;  (4)  conducting  the 
evaluation  in  an  independent  fashion,  preferably  by  a 
third  party;  and  (5)  identifying  experts  to  evaluate  some 
of  the  indicates  based  on  their  expert  knowledge. 

ACKNOWLEDGEMENTS 

This  research  was  sponsmed  in  part  by  the  U.S. 
Navy  under  contract  N60921-89D-A239  through  the 
Systems  Research  Center  at  VPl&SU.  The  authors 
acknowledge  stimulating  discussions  with  James  D. 
Arthur  which  cmtributed  to  die  research  described 
herein. 


BIBLIOGRAPHY 

Ackerman,  A.F.,  LS.  Buchwald  and  F.H.  Lewski 
(19^),  "Software  Inspections:  An  Effective  Ver¬ 
ification  Process,”  IEEE  Software  6,  3  (May),  31- 
36. 

Agresti,  W.W.  and  W.M.  Evanco  (1992),  “Projecting 
Software  Defects  from  Analyzing  Ada  Designs,” 
IEEE  Transactions  on  Softimre  Engineering  18, 
11  (Nov.),  988-997. 

Andersen,  0.  (1990),  “Use  of  Software  En^neering 
Data  in  Support  of  Project  Management,”  Soft¬ 
ware  Engineering  Journal  5, 6  (Nov.),  3S0-3S6. 

Anderson,  G.E.  (1984),  "The  Coordinated  use  of  Five 
Performance  Evaluation  Methodologies,”  Com¬ 
munications  of  the  ACM  27, 2  (Feb.),  1 19-133. 

Arthur,  J.D.  and  RE.  Nance  (1987),  “Developing  an 
Automated  Procedure  for  Evaluating  ^ftware 
Develtqiment  Methodologies  and  Associated 
Products,"  Technical  RepM  SRC-87-(X)7,  Sys¬ 
tems  Research  Center,  Virginia  Tech,  Blacksburg, 
VA. 

Arthur,  J.D.  and  RE.  Nance  (1990),  “A  FramewrHk  for 
Assessing  the  Adequacy  and.  Effectiveness  of 
Software  Development  Methodologies,”  In  Pro¬ 
ceedings  of  the  Ftfteenth  Annual  Software  En¬ 
gineering  Workshop,  Gteenbelt,  MD. 


134 


SENATE  Bi 


OBJECTIVES 

Comcdiett 

ReaulliJity 

TfltlabiUty 

Relifbility 

PwtabOiqr 

AdapldMlity 


PRINCIPLES 

tCerarchical  Decomposition 
Functional  Deconqmsition 
Infonnation  ICding 
Stepwim  Refinement 
Structured  Programming 
Life-Cycle  Verification 
Concuiicnt  Documentatian 


ATTRIBirrES 

Reduced  Coqiling 
Enhanced  Cohesion 
Reduced  Comptexi^ 
Well-Defined  Interfaces 
Readability 
Ease  of  Change 
Traceability 
Visibility  of  Behavior 
Early  Eiror  Detection 


PROJECT 


PRODUCT 


DOCUMENTATION  (+)  PROGRAMS 


A  A 

Propeities 


A 

Properties 


ATTRIBirrES 


Empire  7.  Illustration  of  the  Relatiaishq)  Among  Objectives,  Princ^les,  Attributes 
in  the  Software  Development  Pirocess 


Aiftmr,  J.D.  and  R.E.  Nance,  and  O.  Bald  (1993),  ”Es- 
tablishing  Strftware  Devdopment  Pirocess  Con- 
tnd:  Technical  Objectives,  Operational  Require¬ 
ments,  and  the  Foundational  Framework,*'  The 
Journal  cf  System  and  Scfiware,  to  appear. 

Bacbe,  R.  and  R.  Tinker  (1988),  **A  Rigorous  ^iptoach 
to  hfetrificatitm:  A  Field  Trial  Using  Kin^”  In 
Software  Engineering  88  Second  lEEIBCS  Con~ 
ference  (Liverpool,  England,  July  11-13),  lEE, 
London,  Engla^  pp.  28-32. 

Bald.  O.  (1986),  *Reqiiirements  for  Model  Devde^ 
ment  Environments,”  Computers  &  Operations 
Research  13, 1  (Jan.-Feb.),  33-67. 


Balci,  O.  (1990),  ”Goiddines  for  Successful  Simulation 
Studies,”  In  Proceedings  cf  the  1990  Winter  Sim¬ 
ulation  Coitference,  O.  Bald,  RJ*.  SadowSki,  and 
R.E.  Nance,  Eds.  IEEE,  Piscataway,  N7,  pp.  23- 
32. 

Bald,  O.  and  R£.  Nance  (1987a),  "Simulation  Kfodd 
Development  Environments:  A  Research  Proto¬ 
type,”  Journal  cf  the  Operational  Research  So¬ 
ciety  38,  8  (Aug.),  733-763. 

Balci,  O.  and  R£.  Nance  (1987b),  "Simulation  Sup¬ 
port:  Prototyping  the  Automation-Based  Par¬ 
adigm,”  In  Proceedings  cf  the  1987  Winter  Sim¬ 
ulation  Conference,  A.  Thesen,  H.  Grant,  and 
WJ>.  Kdton,  Eds.  IEEE,  Piscataway,  NJ,  pp.  493- 
302. 


136 


B^d,  O.  and  RJE.  Nance  (1992),  ‘The  Simulation 
Model  Development  ^viionmenu  An  Over- 
vkw,"  In  Proceedings  of  the  1992  Winter  Simula¬ 
tion  Conference  (Arlington,  VA,  Dec.  13-16). 
IEEE,  Piscataway,  N),  pp.  126-136. 

Barshefsky,  A.  and  jL.  (Waiter  (1984),  “Application  of 
Software  Metrics  to  Autoplex  Cellular  Devdop- 
ment,”  In  Conference  Record  of  IEEE  Global  Tel¬ 
ecommunications  Conference.  GLOBECOM  ‘84: 
Communications  in  the  Information  Age  (Atlanta, 
Georgia,  Nov.  26-29),  IEEE,  Piscataway,  New 
Jers^,  pp.  1293-1298. 

Barili,  V.K  and  E£.  Katz  (1983),  “Metrics  of  Interest 
in  an  ADA  Development,”  In  IEEE  Computer  So¬ 
ciety  Workup  on  Software  Engineering  Tech¬ 
nology  Transfer  (Miami  Beach,  Florida,  April  23- 
27),  IEEE,  Piscataway-  NJ,  pp.  22-29. 

Basili,  V.R.  and  R.W.  Selby  (1983),  “Calculation  and 
Use  if  an  Environment’s  Characteristic  Software 
Metric  Set,"  In  Proceedings  of  Eighth  Inter¬ 
national  Conference  on  Software  Engineering 
(London,  England,  August  28-30),  IEEE,  Piscat¬ 
away,  New  Jersey,  pp.  386-391. 

Beane,  J.,  N.  Giddings  aM  J.  Silverman  (1984),  “(^an- 
tifying  Software  Designs,”  In  Proceedings  •  Sev¬ 
enth  International  Corference  on  Software  En¬ 
gineering  (Orlando,  Fla,  March  26-29),  IEEE, 
Piscataway,  NJ,  pp.  314-322. 

Boehm,  B.W.  and  P.N.  Papaccio  (1988),  “Under¬ 
standing  and  Controlling  Software  Costs,”  IEEE 
Transactions  on  Software  Engineering  14,  10 
(Oct.),  1462-1477. 

Booth,  TJL.  and  B.  (^n  (1987),  “Use  of  Performance 
to  Guide  Software  Designs,”  In  Second  Inter¬ 
national  Conference  on  Computers  and  Applica¬ 
tions  (Beijing,  China,  June  23-27),  IEEE,  Kscat- 
away,  NJ,  pp.  303-311. 

Bryan,  Wi.  and  S.G.  Siegel  (1984),  “Product  As¬ 
surance:  Insurance  Against  a  Software  Disaster,” 
Computer  17, 4  (Apr.),  73-83. 

Buckley,  FJ.  (1989),  “Standard  Set  of  Useful  Software 
Metrics  Is  Urgently  Needed,”  Computer  22.  1 
(July),  88-89. 

Caldiera,  G.  and  V.R.  Basili  (1991),  “Identifying  and 
(^alifying  Reusable  Software  Conqranents,” 
Computer  24, 2  (Feb.),  61-70. 

Cardenas-Garcia,  S.  and  M.V.  Zelkowitz  (1991),  “A 
Management  Tool  for  Evaluation  of  Software  De¬ 
signs,”  IEEE  Transactions  on  Software  En¬ 
gineering  17, 9  (Sqpt.),  961-972. 

Carver,  DJL  (1988),  “Conq)arison  of  the  Effect  on  De¬ 
velopment  Paradigms  on  Increases  in  (jomplex- 
ity,”  Software  Engineering  Journal  3,  6  (Nov.), 
223-228. 

Chu,  W.W.,  C  Sit  and  K.K.  Leung  (1991),  "Task  Re¬ 
sponse  Hme  for  Real-Ume  Distributed  Systems 
With  Resource  Contentions,”  IEEE  Transactions 
on  Software  Engineering  17,  10  (Oct),  1076- 
1092. 


Qt^p,  J.  (1993),  “Getting  Staned  on  Software  Met¬ 
rics,” /£££  Software  10, 1  (Jan.),  108-110. 

Conte,  S.D.,  H.E.  Dunsmore,  and  V.Y.  Shen  (1986), 
Software  Engineering  Metrics  and  Models,  Ben¬ 
jamin  Cummings  Publishing,  Menlo  Park,  CA. 

Derrick,  EJ.  (1992),  “A  Visual  Simulatkm  Sui^xxt  En¬ 
vironment  Based  on  a  Multifac^ed  Conceptual 
Ramework,”  PhJ).  Dissertation,  Dqrartment  of 
Computer  Science,  VPI&SU,  Blacksburg,  VA, 
Apr. 

Denick,  EJ.  and  O.  Baki  (1992a),  “DOMINO:  A  Mul¬ 
tifaceted  (^oncqMual  Framework  for  Visual  Sim¬ 
ulation  Modeling,”  Technical  Rqxxt  TR-92-43, 
Department  of  Computer  Science,  VPI&SU, 
Bl^ksburg,  VA,  Aug. 

Dmkk,  EJ.  and  O.  Balci  (1992b),  “A  Visual  Simula¬ 
tion  Support  Environment  Based  on  the  DOMINO 
Concept^  Framewcffk,”  Technical  Repent 
TR-92-44,  Department  of  Computer  Science, 
VPI&SU,  Blacksburg,  VA,  Aug. 

Derrick,  EJ.  and  O.  Balci  (1992c).  “A  Visual  Simula¬ 
tion  Model  Specification  Language.”  (in  pnpar»- 
tion). 

Deutsch,  M.S.  (1988),  “Focusing  Real-Time  Systems 
Analysis  on  User  Operations,”  IEEE  Software  5,  3 
(SepD,  39-30. 

Emerson,  TJ.  (1984),  “A  Discriminant  Metric  for  Mod¬ 
ule  Cohesion,”  In  Proceedings  -  Seventh  In- 
tenational  Conference  on  Software  Engineering 
(Orlando,  F.a,  March  26-29),  IEEE,  Piscataway, 
NJ.pp.  294-303. 

Emery,  K.D.  and  B.K.  Mitchell  (1989),  “Multi-level 
Software  Testing  Based  on  Cyclomatic  Complex¬ 
ity,”  In  Proceedings  of  the  IEEE  1989  National 
Aerospace  and  Electronics  Conference  -  NAE- 
CON  1989  (Dayton,  Ohio,  May  22-26), ,  Piscat¬ 
away,  NJ,  pp.  300-307. 

Far.  W.H.  and  A.  Ashton  (1992),  “Developing  a  Met¬ 
rics  Assessment  Program  for  the  SLBM  Solbvare 
Development  Division,”  In  Proceedings  of  the 
1992  Complex  Systems  Engineering  Synthesis  and 
Assessment  Technology  Workshop,  NSWC,  Silver 
Siting,  MD,  pp.  139-146. 

Fenick,  S.  (1990),  “Implementing  Management  Met¬ 
rics:  An  Army  Program,”  IEEE  Software  7,  2 
(Mar.),  63-72. 

Fenton,  NJE.  and  A.A.  Kaposi  (1987),  “Metrics  and 
Software  Structure,”  Information  and  Software 
Technology  29.  6  (Jul.-Aug.),  301-320. 

Ireedman,  RS.  (1991),  “Testability  of  Software  Com¬ 
ponents,”  IEEE  Transactions  on  Software  En¬ 
gineering  17,  6  (June),  333-364. 

Geist,  R.  and  K  Trivedi  (1990),  “Reliability  Estimation 
of  Fault-Tolerant  Systems:  Tools  and  Tech¬ 
niques,”  Computer  23. 1  (July), -52-62. 

Gibson,  V.R.  and  J.A.  Senn  (1989),  “System  Structure 
and  Software  Maintenance  Performance,”  Com¬ 
munications  of  the  ACM  32,  3  (Mar.),  347-358. 


137 


i 


Gilb,  T.  (1977),  Software  Metnii,  Winthrop  Publishers, 
Cambridge,  MA. 

Gilb,  T.  (1985),  "Software  Specification  and  Design 
Must  ‘Engineer’  C^ality  and  Cost  Iterativel/*,  In 
Third  International  Workshop  on  Software  Spec¬ 
ification  and  Design  (London,  England,  August 
26-27),  IEEE,  London,  England,  pp.  75-76. 

Gill,  G.K.  and  C.F.  I^emerer  (IWl),  “C^clomatic  Com¬ 
plexity  Density  and  Software  Maintenance  Pro¬ 
ductivity,”  IEEE  Transactions  on  Software  En¬ 
gineering  17, 12  (Dec.),  1284-1288. 

Gould,  JJ}.,  SJ.  Boies,  and  C.  Lewis  (1991),  "Making 
Usable,  Useful,  Productivity.  Enhancing  Con^)ut- 
er  Applications,”  Communications  of  the  ACM  34, 

1  (Jan.),  75-85. 

Grady,  R.B.  (1987),  "Measuring  and  Managing  Soft¬ 
ware  Maintenance,”  IEEE  Software  4,  5  (Sq)t.), 
35-45. 

(jrady,  R.B.  (1990),  "Wwk-Product  Analysis:  The  Phi¬ 
losopher’s  Stone  of  Software?,”  IEEE  Software  7, 

2  (Mar.),  26-34. 

Grady,  R.B.  and  D.L.  (Taswell  (1987),  Software  Met¬ 
rics:  Establishing  a  Company-Wide  Program, 
Prentice-Hall,  Englewood  Cliffs,  NJ. 

Gremillion,  L.L.  (1984),  “Determinants  of  Program  Re¬ 
pair  Maintenance  Requirements,”  Communica¬ 
tions  of  the  ACM  27,  8  (Aug.),  826-832. 

Hall,  DJ-.,  J.J.  Gibbons  and  D.A.  Woodle  (1985), 
"Avoid  Disaster;  The  Use  of  an  Integrated  Tool 
for  Managing  Throughput  and  Response  Time  Re¬ 
quirements  in  Embedded  Real-Time  Systems,”  In 
Conference  on  Software  Tools  (New  York,  NY, 
April  15-17),  IEEE,  Piscataway,  NJ,  pp.  106-111. 

Han,  W.,  Y.  Choe  and  Y.  Park  (1987),  “Software  Met¬ 
rics  Using  Operand  Types,”  In  Proceedings  - 
TENCON  87:  1987  IEEE  Region  10  Conference. 
'Computers  and  Communications  Technology  To¬ 
wards  2000'  (Piscataway,  New  Jersey,  August  25- 
28),  IEEE,  Seoul,  South  Korea,  pp.  1212-1215. 

Heitkoeten,  U.  (1990),  "Design  Metrics  and  Their  Aid 
to  Automatic  Collection,”  Information  and  Soft¬ 
ware  Technology  32, 1  (Jan.-Feb.),  79-87. 

Henry,  S.  and  C.  Selig  (1990),  “Predicting  Source-Code 
Complexity  at  Ae  Design  Stage,”  IEEE  Software 
7. 2  (Mar.),  36-43. 

Henry,  S.  and  D.  Kaftira  (1984),  "The  Evaluation  of 
Systems  Structure  Using  (^antitative  Software 
Metrics,”  Software  Practice  and  Experience  14,  6 
(June),  561-573. 

Henry,  S.  and  R.  Goff  (1989),  "Complexity  Measure¬ 
ment  of  a  Graphical  I^gramming  Language,” 
Software-  Practice  and  Experience  9,  4  (Nov.), 
1065-1088. 

Henry,  S.  and  R.  Goff  (1991),  “CTomparison  of  a  Graph¬ 
ical  and  a  Textual  Design  Language  Using  Soft¬ 
ware  Quality  Metrics,”  Journal  o/  Systems  and 
Software  14, 3  (Mar.),  133-144. 


Herndon,  MA.  and  JA.  McCall  (1983),  "The  Re-  ( 

quirements  Management  Methodology:  A  Meas¬ 
urement  Framework  for  Total  Systems  Re¬ 
liability,”  In  Total  Systems  Reliability  Symposium 
(GaitherslHirg,  MD,  Dec.  12-14),  IEEE,  Piscat¬ 
away,  NJ,  pp.  1 19-122. 

Hirayama,  M.  (1990),  "Practice  of  (^ality  Modeling 
and  Measurement  on  Software  Life-Cycle,”  In 
12th  International  Conference  on  Software  En¬ 
gineering  (Nice,  France,  Mar.  26-30),  IEEE,  Pis¬ 
cataway,  NJ,  pp.  98-107. 

Hoffman,  GJO.  (1989),  "Early  Introduction  of  Software 
Metrics,”  In  Proceedings  of  the  IEEE  1989  Na¬ 
tional  Aerospace  and  Electronics  Conference  -  ■ 

NAECON  1989  (Dayton,  Ohio,  May  22-26), 

IEEE,  Piscataway,  NJ,  pp.  559-563.  ' 

Ince,  D.C.  arxl  S.  Hekrriatpour  (1988),  "An  Approach  to 
Automated  Software  Design  Based  on  Product 
Metrics,”  Software  Engineering  Journal  3,  2 
(Mar.),  53-56. 

Ince,  D.O.  and  MJ.  Sheppard  (1988),  "System  Design 

Metrics:  A  review  and  perspective,”  In  Software  I 

Engineering  88  Second  lEE/BCS  Conference  | 

(Liverpool,  England,  July  11-15),  lEE,  London, 

England,  pp.  23-27.  . 

Joshi,  SM.  and  K.B.  Misra  (1991),  “(^antitative  Anal-  I 

ysis  of  Software  (^ality  During  the  ‘Design  and  * 

Implementation’  Phase,”  Microelectronics  and 
Reliability  31.  5,  m-m.  ( 

Karolak,  D.K.  (1985),  “Identifying  Software  Quality 

Metrics  for  a  L^e  Software  Development,”  In  ' 

GWBECOM  '85:  IEEE  Global  Tele¬ 
communications  Conference  Record  (New  Or¬ 
leans,  LA,  Dec.  2-5),  IEEE,  Piscataway,  NJ,  pp. 

61-64. 

Karunanithi,  N.,  D.  Whitley,  and  Y.K.  Malaiya  (1992), 
"Predictability  of  Software  Reliability  Using  Con- 
nectionist  Models,”  IEEE  Transactions  on  Soft¬ 
ware  Engineering  18, 7  (July),  563-574. 

Kavinde,  T.M.  (1989),  “Performance  Analysis  of  Soft¬ 
ware:  CDOT  case  study,”  In  TENCON  '89: 

Fourth  IEEE  Region  10  International  Conference 
(Bombay,  India,  Nov.  22-24),  IEEE,  Piscataway, 

New  Jersey,  pp.  718-721. 

Kearney,  J.K.,  KL.  Sedlmeyer,  W.B.  Thompson,  M.A. 

Gray  and  M.A.  Adler  (1986),  “Software  Complex¬ 
ity  Measurement,”  Communications  of  the  ACM 
29. 11  (Nov.),  1044-1050. 

KemerCT,  CF.  (1987),  “An  Empirical  Validation  of 
Software  Oast  Estimation  Models,”  Communica¬ 
tions  of  the  ACM  30,  5  (May),  416-429. 

Kemerer,  C.F.  (1993),  “Reliability  of  Function  Points 
Measurement,”  Communications  of  the  ACM  36,  2 
(Feb.),  85-97. 

Kemerer,  CE.  and  B.S.  Porter  (1992),  “Improving  the  I 

Reliability  of  Function  Point  Measurement:  An  | 

Empirical  Study,”  IEEE  Transactions  on  Software 
Engineering  18, 11  (Nov.),  1011-1024. 

i 


138 


Kboshgoitaar,  T^.,  J.C  Munson,  B.B.  Bhattacharaya 
and  G.D.  Riduudson  (1992),  "Predictive  Mod¬ 
eling  Techniques  of  Software  Quality  from  Soft¬ 
ware  Measures,”  IEEE  Transactions  on  Software 
Engineering  7 A  1 1  (Nov.),  979-987. 

Kitchenham,  B.A.  (1988),  "An  Evaluation  of  Software 
Structure  Metrics,”  In  Proceedings  of  the  Twelfth 
Annual  International  Computer  ^ftwre  and  Ap¬ 
plications  Coiference  (COMPSAC  88)  (Chicago, 
Illinois,  Oct.  5-7),  IEEE,  Piscataway,  NJ,  pp.  369- 
376. 

Kitchenham,  B.A.  and  J.A.  McDermid  (1986),  "Soft¬ 
ware  Metrics  and  Integrated  Support  Environ¬ 
ments,”  Software  Engineering  1, 1  (Jan.),  58-64. 

Kitchenham,  B.A.  and  SJ.  Linkman  (1990),  "Design 
Metrics  in  Practice,”  Information  Software  and 
Technology  32. 4  (May),  304-310. 

Kitchenham,  BA.,  L.M.  Pickard,  and  SJ.  Linkman 
(1990),  "Evaluation  of  some  design  metrics,”  Soft¬ 
ware  Engineering  Journal  9, 1  (Jan.),  50-58. 

Laydimanan,  K.B.,  S.  Jayapralmsh  and  P.K.  Sinha 
(1991).  "Properties  of  Control-Flow  Complexity 
Measures,”  IEEE  Transactions  on  Software  En¬ 
gineering  17. 12  (Dec.),  1289-1295. 

Laranjeira,  LA.  (1990),  "Software  Size  Estimation  of 
Object-Oriented  Systems,”  IEEE  Transactions  on 
Software  Engineering  16. 5  (May),  510-522. 

Lew,  K.S.,  T.S.  Dillon  and  K.E.  Forward  (1988),  “Soft¬ 
ware  Complexity  and  Its  Impact  on  Software  Re¬ 
liability,”  IEEE  Transactions  on  Software  En¬ 
gineering  14. 11  (Nov.),  1645-1655. 

Litke,  J.  (1992),  "A  Method  for  the  Assessment  of  Sys¬ 
tem  Designs,”  In  Proceedings  of  the  1992  Com¬ 
plex  Systems  Engineering  Synthesis  and  Assess¬ 
ment  Technology  Workshop,  NSWC,  Silver 
Spring,  MD,  pp.  155-169. 

Low,  G.C.  and  DJ^  Jeffrey  (1990),  "Function  Points  in 
the  Estimation  and  Evaluation  of  the  Software 
Process,”  IEEE  Transactions  on  Software  En¬ 
gineering  16. 1  (Jan.),  64-71. 

MacKnight,  C.B.  and  S.  Balagopalan  (1989),  "An  Eval¬ 
uation  Tool  for  Measuring  Authoring  System  Per¬ 
formance,”  Communications  of  the  ACM  32.  10 
(Oct.),  1231-1236. 

McCabe,  TJ.  and  C.W.  Butler  (1989),  "Design  Com¬ 
plexity  Measurement  and  Testing,”  Communica¬ 
tions  of  the  ACM  32. 12  (Dec.),  1415-1425. 

McCabe,  TJ.,  L.F.  Young,  K.W.  Qaybaugh  and  J. 
McManus  (1983),  "Design  Basis  Paths:  A  Com¬ 
plexity  Driven  Design  Inspection  Methodology,” 
In  Total  Systems  Reliability  Symposium  (Gai¬ 
thersburg,  MD,  Dec.  12-14),  I]^E,  Piscataway, 
NJ,pp.  67-72. 

Mohanty,  S.N.  (1981),  “Entropy  Metrics  for  Design 
Evaluation,”  Journal  of  Systems  and  Software  2.  1 
(Feb.),  39-46. 


Mukhopadhyay,  T.  and  S.  Kekre  (1992),  "Software  Ef¬ 
fort  Models  for  Early  Estimation  of  Process  Con¬ 
trol  Applications,”  IEEE  Transactions  on  Soft¬ 
ware  Engineering  18. 10  (OcL),  915-924. 

Munson,  J.  and  TAl.  Khosh^ftaar  (1992),  "The  De¬ 
tection  of  Fault-Prone  Programs,”  IEEE  Trans¬ 
actions  on  Software  Engineering  18.  5  (May), 
423-433. 

Munson,  J.C  and  TM.  Khoshgoftaar  (1992),  "Meas¬ 
uring  Dynamic  Program  Complexity,”  IEEE  Soft¬ 
ware  9.6  0iov.).4%-55. 

Musa,  J.D.  and  Af .  Ackerman  (1989),  "(^antifying 
Software  Validation:  When  to  Stop  Testing?,” 
IEEE  Software  6. 3  (May),  19-27. 

Nance,  R.E.  and  JJ>.  Arthur  (1988),  "The  Methodology 
Roles  in  the  Realization  of  a  Model  Development 
Environment,”  In  Proceedings  of  the  1988  Winter 
Simulation  Conference,  pp.  220-225. 

Navlakha,  J.K.  (1987),  “A  Survey  of  System  Complex¬ 
ity  Metrics,”  Computer  Journal  30,  3  (June),  233- 
238. 

Nejmeh,  BA.  (1988),  “NPATH:  A  Measure  of  Execu¬ 
tion  Path  Complexity  and  its  Applications,”  Com¬ 
munications  of  the  ACM  31. 2  (F^.),  188-200. 

Nguyen,  CM.  and  SX.  Howell  (1992),  "System  Design 
Factors,”  In  Proceedings  of  the  1992  Complex 
Systems  Engineering  Synthesis  and  Assessment 
Technology  Workshop.  NSWC  Silver  Spring, 
MD,pp.  147-154. 

Pamas,  DJL.,  J.v.  Schouwen,  and  S.P.  Kwan  (1990), 
"Evaluation  of  Safety-Critical  Software,”  Com¬ 
munications  of  the  ACM  33.  6  (June),  636-648. 

Paulish,  DJ.  (19%),  "Methods  and  Metrics  for  De¬ 
veloping  High  (^ality  Patient  Monitoring  System 
Software,”  In  Proceedings  of  the  Third  Annual 
IEEE  Syn^osium  on  Computer-Based  Medical 
Systems  (Qu^l  Hill,  NC  Jun  3-6),  IEEE,  Piscat¬ 
away,  NJ,  pp.  145-152. 

Pollock,  (3.M.  and  S.  Sheppard  (1987),  "A  Design 
Methodology  for  the  Utilization  of  Metrics  Within 
Various  Ph^s  of  Software  Life-cycle  Modds,” 
In  Proceedings  II  -  COMPSAC  87:  The  Eleventh 
Annual  International  Computer  Software  and  Ap¬ 
plications  ConferetKe  (Tokyo,  Jtqran,  Oct  7-9), 
IEEE,  Piscataway,  NJ,  pp.  221-230. 

Porter,  A.A.  and  R.W.  Sdby  (1990),  “Empirically 
Guided  Software  Development  Using  Metric- 
Based  Gassification  Trees,”  IEEE  Software  7.  2 
(Mar.),  46-54. 

Ramamoorthy,  C.V.,  A.  Bhide  and  V.  Garg  (1986), 
"Software  (^ality  and  Requirements  Specifica¬ 
tion,”  In  Proceedings  -  IEEE  Computer  Society 
1986  Intematiorud  Conference  on  Computer  Lan¬ 
guages  (Miami,  Florida,  Oct.  27-30),  IEEE,  Pis¬ 
cataway,  NJ,  pp.  75-83. 

Ramamoorthy,  C.V.,  W.  Tsai  and  Y.  Usuda  (1984), 
"Software  Engineering:  Problems  and  Per- 
qrectives,”  Computer  17. 10  (Oct),  191-207. 


139 


Ramamooithy,  C.V.,  W.  Tsai,  T.  Yamaura,  and  A. 
Bhkle  (1985),  ‘Metrics  Guided  Methodology,”  In 
Proceedings  -  COMPSAC  85:  The  IEEE  Comput¬ 
er  Society's  Ninth  International  Computer  Soft¬ 
ware  and  Applications  Cortference  (Chicago,  D- 
linois,  Oct.  9-11),  IEEE,  Piscataway,  New  Jersey, 
PP.11M20. 

Ranuunuithy,  N.  and  A.  Mdton  (1988),  “A  Synthesis  of 
Software  Science  Measures  and  die  Cyclomatic 
Number,”  IEEE  Transactions  on  Software  En¬ 
gineering  14,  8  (Aug.),  1116-1121. 

Rdbman,  A.L.  and  M.  Veeraraghavan  (1991),  ”Re- 
liability  Modeling:  An  Overview  for  Systems  De¬ 
signers,”  Computer  24, 4  (Apr.),  49-56. 

R^olds,  R.G.  (1987),  ”Metric-Based  Readoning 
About  Psuedocode  Design  in  die  Partial  Metrics 
System,”  Information  and  Software  Technology 
29. 9  (Nov.),  497-502. 

Reynolds,  R.G.  (1987),  "The  Partial  Metrics  System: 
Modeling  the  Stepwise  Refinement  Process  Using 
Partial  Metrics,”  Communications  of  the  ACM  30, 
11  (Nov.),  956-963. 

R^old^  R.G.  (1990),  "Partial  Metrics  System:  A 
Tool  to  Support  the  Metrics-Driven  Design  of 
Psuedocode  Programs,”  Journal  of  Systems  and 
Software  9, 4  (Jan.),  287-295. 

Rombach,  HD.  (1990),  "Design  Measurement:  Some 
Lessons  Les^ed,”  IEEE  Software  7,  2  (Mar.),  17- 
25. 

Schneidewind,  N  J.  (1979),  "Software  Metrics  for  Aid¬ 
ing  Program  Development  and  Debugging,”  In 
AFIPS  Conference  Proceedings  VoL  48  (New 
York,  NY,  June  4-7),  AFIPS  Press,  Montvale,  NJ, 
pp.  989-994. 

Schneidewind,  NJ.  (1992),  "Methodology  for  Val¬ 
idating  Software  Metrics,”  In  Proceedings  of  the 
1992  Complex  Systems  Engineering  Synthesis  and 
Assessment  Technology  Workshop,  NSWC,  Silver 
Spring.  MD,  pp.  171-198. 

Selby,  R.W.  (1^),  “Extensible  Integration  Frame¬ 
works  for  Measurement,”  IEEE  Software  7,  6 
(Nov.),  83-84. 

Slupperd,  M.  (1990),  “Design  Metrics:  An  Enqiirical 
Analysis,”  Square  Engineering  Journal  5,  1 
(Jan.),  3-10. 

Sheppard,  M.  (1990),  "Early  Life-cycle  Metrics  and 
Software  quality  models,”  Information  and  Soft¬ 
ware  Technology  32, 4  (May),  311-316. 

Shepperd,  M.  (1988),  "An  Evaluation  of  Software  Prod¬ 
uct  metrics,”  Information  and  Software  Tech¬ 
nology  30. 3  (Apr.).  177-288. 

Shq>perd,  M.  and  D.  Ince  (1989),  "Metrics,  Outlier 
Analysis  and  the  Software  Elesign  Process,”  In¬ 
formation  and  Software  Techtology  31.  2  (Mar.), 
91-98. 


Shepperd,  M.  and  D.  Ince  (1990),  "The  Use  of  Metrics 
for  the  Early  Detection  of  Design  ErtiMs.”  In 
SE90:  Proceedings  of  Software  Engineering  90 
(Bri^iton,  UK,  July  24-27).  Camtnidge  Uni¬ 
versity  Press,  Cambridge,  UK,  pp.  67-88. 

Silverman,  J.,  N.  Giddings,  and  J.  Beane  (1983),  "An 
Approach  to  Design-for-Maintenaoce,”  In  Record- 
Software  Mednterumce  Workshop  (Monterey,  OA, 
Dec.  6-8),  IEEE,  Piscataway,  NJ,  pp.  106-110. 

Smith,  C.  and  J.C.  Browne  (1980),  "Aq)ects  of  Soft¬ 
ware  Design  Analysis:  (Concurrency  and  Block¬ 
ing.”  Performance  Evaluation  Review  9,  2  (Sum¬ 
mer).  245-253. 

Symons,  CJL  (1988),  "Function  Pdnt  Analysis:  Dif¬ 
ficulties  and  In^rovements,”  IEEE  Transactions 
on  Software  Engineering  14,  1  (Jan.),  2-12. 

Troy,  D.A.  and  S.H.  Zweben  (1981),  "Measuring  the 
Quality  of  Structured  Designs,”  Journal  of  Sys¬ 
tems  and  Software  2, 2  (June),  113-120. 

Vdez,  C£.  and  PA.  Scheffer  (1978),  “On  die  Problem 
of  Software  Design  and  Measuring  (^ality,”  In 
IEEE  Proceedings  of  the  National  Aerospace  and 
Electronics  Corference  NAECON  ‘78  (Dayton, 
Ohio,  May  16-18),  IEEE,  Piscataway,  New  Jersey, 
pp.  223-229. 

Vemer,  J.  and  G.  Tate  (1992),  “A  Software  Size  Mod¬ 
el,”  IEEE  Transactions  on  Software  Engineering 
JS.  4  (Apr.),  265-278. 

Vesscy,  I.  and  R.  Weber  (1983),  “Some  Factors  Af¬ 
fecting  Program  Repair  Maintenance,”  Com¬ 
munications  of  the  ACM  26, 2  (Feb.),  128-136. 

Vienneau,  RX.  (1992),  "The  (Consolidated  Experience 
Factory:  An  Approach  for  Instrumenting  Systems 
Engineering,”  In  Proceedings  of  the  1992  Com¬ 
plex  Systems  Engineering  Synthesis  and  Assess¬ 
ment  Technology  Workshop,  NSWCC,  Silver 
Spring,  MD,  pp.  201-206. 

Walters,  GF.  and  J.A.  McCall  (1979),  “Software  Qual¬ 
ity  Metrics  for  Life-CCycle  Cost-Reduction,”  IEEE 
Transactions  on  Reliability  R-28,  3  (Aug.),  212- 
220. 

Weyuker,  EJ.  (1988),  “Evaluating  Software  Complex¬ 
ity  Measures,”  IEEE  Transactions  on  Software 
Engineering  14. 9  (Sept.),  1357-1365. 

Whitworth,  M.H.  and  PA.  Szulewski  (1980),  "The 
Measurement  of  (Control  and  Data  Flow  Conqilex- 
ity  in  Software  Designs,”  In  IEEE  Computer  So¬ 
ciety  International  Computer  Software  Applica¬ 
tions  Conference  4th  COMPSAC  80  (Chicago, 
Illinois,  Oict.  27-31),  IEEE,  Piscataway,  New  Jer¬ 
sey,  pp.  735-743. 

Wohlin,  C.  and  D.  Rapp  (1989),  “Performance  Analysis 
in  the  Early  Design  of  Software,”  In  Seventh  In¬ 
ternational  Conference  on  Software  Engineering 
for  Telecommunications  Switching  Systems 
(Bournemouth,  England,  July  3-6),  lEE,  London, 
England,  pp.  114-121. 


140 


oyroiM.  SgLlCTIOM  oy  fAILURg_P^T^  FOR  PRBPlCTIlfa  FAILTOB  CQCIITe 

Noraan  F.  Schnaidevind 


Code  AS/Ss 

Naval  Postgraduate  School 

Monterey,  CA  93943 

(408)  656-2719/2471 

FAX:  (408)  656-3407 

Internet :  0442p . tvnl . cc . nps . navy . ai 1 


Abstract 

In  the  use  of  software  reliability  aodels  it  is  not  necessarily  the 
case  that  all  the  failure  data  should  be  used  to  estiaate  aodel 
paraaeters  and  to  predict  failures.  The  reason  for  this  is  that  old  data 
aay  not  be  as  representative  of  the  current  and  future  failure  process 
as  recent  data.  Therefore  it  aay  be  possible  to  obtain  aore  accurate 
predictions  of  future  failures  by  excluding  or  giving  lower  weight  to 
the  earlier  failure  counts.  Although  techniques  such  as  aoving  average 
and  exponential  saoothing  are  frequently  used  in  other  fields,  such  as 
inventory  control,  we  did  not  find  use  of  this  idea  in  the  various 
aodels  we  surveyed.  One  aodel  that  includes  the  concept  of  ‘^electing  a 
subset  of  the  failure  data,  where  appropriate,  is  the  Schne.s  iewind  Non- 
Hoaogeneous  Poisson  Process  (NHPP)  software  reliability  aodel.  In  order 
to  use  the  concept  of  "data  aging",  there  aust  be  a  criterion  for 
deteraining  the  optiaal  value  of  the  starting  failure  count  interval.  In 
previous  research  we  identified  the  aean  square  error  as  the  best 
criterion  for  selecting  the  starting  interval  of  the  failure  data.  In 
this  paper  we  apply  the  criterion  to  select  the  optiaal  starting 
interval.  We  show  that  significantly  improved  reliability  predictions 
can  be  obtained  by  using  a  subset  of  the  failure  data,  based  on  applying 
the  criterion,  and  using  the  Space  Shuttle  On-Board  software  as  an 
exeunple. 

Keywords:  NHPP  software  reliability  aodel,  optiaal  selection  of  failure 
data.  Space  Shuttle. 


INTRODUCTION 

In  the  use  of  software  reliability  aodels  it  is  not  necessarily  the 
case  that  all  the  failure  data  should  be  used  to  estimate  model 
parameters  and  to  predict  failures.  The  reason  for  this  is  that  old  data 
aay  not  be  as  representative  of  the  current  and  future  failure  process 
as  recent  data.  If  the  failure  process  remains  the  same  over  a  long 
series  of  observations,  we  should  use  a  great  deal  (or  all)  of  the 
failure  data;  if  there  is  a  significant  change  in  the  process,  we  should 
use  only  the  most  recent  observations  (BRO  63].  Therefore  it  may  be 
possible  to  obtain  more  accurate  predictions  of  future  failures  by 
excluding  or  giving  lower  weight  to  the  earlier  failure  counts.  Although 
techniques  such  as  aoving  average  and  exponential  smoothing  are 
frequently  used  in  other  fields,  such  as  inventory  control,  we  did  not 
find  mention  of  this  idea  in  the  many  aodels  we  examined  in  various 
papers  and  reports  that  contain  surveys  of  models  [AIA  91,  ABD  86,  FAR 
91,  FAR  83,  60E  85,  LIT  80].  One  model  that  includes  the  concept  of 
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selecting  a  subset  of  the  failure  data  is  the  Schneldewlnd  Non- 
Honogeneous  Poisson  Process  (NHPP)  software  reliability  nodel  [SCH  75, 
XIE  92].  In  order  to  use  the  concept  of  data  aging  (i.e.,  giving  nore 
weight  to  recent  failure  counts) ,  there  nust  be  a  criterion  for 
determining  the  optimal  value  of  s,  an  index  in  the  range  l^s^t,  which 
is  the  starting  value  of  equal  length  failure  count  intervals.  In  this 
model  one  may  choose  to  use  all  the  failure  counts  in  the  execution 
intervals  from!  to  t  (Method  1),  exclude  counts  from  1  to  s-1  (Method 
2) ,  or  use  an  aggregate  count  from  1  to  s-1  and  individual  counts  from 
s  to  t  (Method  3 ) . 

importance  of  Research 

The  importance  of  this  research  is  that  significant  improvements  were 
obtained  in  the  accuracy  of  predicting  failure  count  and  time  to  next 
failure  (due  to  space  limitations,  only  the  failure  count  analysis  is 
presented  in  this  paper)  by  not  using  all  the  observed  failure  data, 
where  appropriate,  as  we  will  illustrate  in  the  examples.  Also  of 
significance  is  the  identification  of  a  criterion  for  determining  "where 
appropriate";  that  is,  the  method  for  determining  the  optimal  value  of 
s,  s*,  where  "optimal"  is  defined  as  the  value  of  s  that  produces  the 
most  accurate  predictions.  This  research  was  conducted  on  the 
Schneldewlnd  model  and  the  criterion  was  applied  to  the  Space  Shuttle 
On-Board  flight  software.  Since  this  model  is  used  to  assist  IBM-Houston 
in  making  software  reliability  predictions  for  the  Space  Shuttle 
software,  we  were  motivated  to  find  a  generic  method  for  optimal  failure 
data  selection  and  to  apply  this  method  to  obtain  the  most  accurate 
predictions  possible  for  the  Space  Shuttle  [SCH  92,  AIA  92].  The 
concepts  developed  here  have  general  applicability  to  other  models  but 
in  order  to  realize  the  advantages  of  optimal  data  selection,  it  would 
be  necessary  to  modify  the  parameter  estimation  methods  used  in  those 
models  to  explicitly  allow  for  subsets  of  the  failure  data  to  be  used. 

The  purpose  of  our  r  )  'earch  is  to  demonstrate  the  effectiveness  of  the 
Mean  Square  Error  (MSE)  criterion,  which  we  identified  as  the  best  of 
four  criteria  which  were  developed  and  analyzed  in  previous  research 
[SCH  93],  for  selecting  s*.  We  demonstrate  that,  when  conditions  warrant, 
s*>i  can  produce  more  accurate  failure  predictions  than  s^l  for  the  Space 
Shuttle  software. 

Before  discussing  the  criterion  for  selecting  s*,  we  provide  an 
overview  of  the  Schneidewind  model  parameter  estimation  in  order  to 
establish  the  rationale  for  data  aging.  As  a  by-product  of  this  analysis 
we  show  that,  for  certain  modules,  dramatic  improvements  can  be  made  in 
prediction  accuracy  by  not  using  all  the  failure  counts.  We  close  with 
conclusions  about  the  utility  of  the  data  aging  approach  and  the 
criterion  to  use  for  data  aging;  we  also  indicate  our  future  research 
efforts. 


OVERVIEW  OF  SCHNEIDEWIND  MODEL  PARAMETER  ESTIMATION 

'^he  method  of  maximum  likelihood  is  used  to  estimate  the  model 
parameters  a  and  0,  for  a  given  s,  where  a  is  the  failure  rate  at  t^O 
and  0  is  the  failure  rate  time  constant  (i.e.,  a  measure  of  how  fast  the 
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failure  rata  decays  — -  the  ssaller  the  value  of  fi,  the  faster  the 
failure  rate  decreases). 

a.  yaraaeter  estisationt  Msthod  1 


use  all  of  the  failtare  counts  from  interval  1  through  t  (l»s:St).  This 
aethod  is  used  if  it  is  assuaed  that  all  of  the  historical  failure 
counts  froa  1  'through  t  are  representative  of  the  future  failure 
process.  Equations  (1)  and  (2)  are  used  to  estimate  P  and  a, 
respectively  [SCH  75,  FAR  83,  FAR  91]. 


oxp(P)-l  e3cp(pt)-l  Xt 


Cl) 


a> 


i-ejqpC-pt) 


(2) 


where  are  failure  counts  .in  1,2, . . .  ,k+l, . . .  ,t  and  is  the 

cumulative  failure  count  in  l,t. 

h.  Parameter  estiaationt  Method  R 

Use  failure  counts  only  in  the  intervals  s  through  t  (l^sjft).  This 
method  is  used  if  it  is  assuaed  that  only  the  historical  failure  counts 
from  s  through  t  are  representative  of  the  future  failure  process. 
Equations  (3)  and  (4)  are  used  to  estimate  P  and  a,  respectively  [SCH 
75,  FAR  83,  FAR  91]. 
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where  are  failure  counts  in  s,S'i'l, . . .  ,s4-k, . . .  ,t  and  X,,, 

cumulative  failure  count  in  s,t.  We  note  that  Method  2  is  equivalent  to 
Method  1  for  8*1  (i.e.,  (1)  and  (2)  are  obtained  by  substituting  s*l  in 
(3)  and  (4) ,  respectively) . 

o.  TRtMtttir-ffftilMtigaL  Mttliga  3 


Use  the  cumulative  failure  count  in  the  interval  l  through  s-l  and 
individual  failure  counts  in  the  intervals  s  through  t  (2S8St) .  This 
m  ''hod  is  used  if  it  is  assumed  that  the  historical  cumulative  failure 
count  from  1  through  s-l  and  the  individual  failure  counts  from  s 
through  t  are  representative  of  the  future  failure  process.  This  method 
is  intermediate  to  Method  1,  which  uses  all  the  data,  and  Method  2, 
which  discards  "old"  data.  Equations  (5)  and  (6)  are  used  to  estimate  p 
and  a,  respectively  [SCH  75,  FAR  83,  FAR  91). 
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i^«r«  X^i  is  the  cuBulativ*  failura  count  in  i,s-l.  Na  nota  that  Method 
3  is  equivalent  to  Method  1  for  s>2  (i.e.,  (1)  is  obtained  by 
substituting  s«2  in  (5) ) . 

The  three  aethods  are  susaarised  in  Table  1  with  respect  to  the 
observed  paraaeter  estiaation  range  and  the  prediction  range  — >  observed 
(i£t)  and  future  (i>t)  —  where  T  is  the  upper  liait  of  the  prediction 
range. 


Table  1 

Paraaeter  and  Prediction  Manges 
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As  stated.  Method  2  disregards  failure  counts  for  intervals 
where  l^s^t.  In  this  section  we  apply  Method  2  with  respect  to  the  MSE 
criterion.  In  all  exaaples  a  and  0  are  estiaated  in  the  range  i«l-20  and 
failure  count  predictions  are  aade  in  the  range  i«>21-30,  where  an 
interval  is  30  days  of  continuous  execution  of  the  Space  Shuttle 
software. 

rtiXlMTt  ggPdiffUgn 

Once  a  and  0  have  been  estiaated  for  a  given  s,  using  one  of  the  three 
aethods,  various  predictions  can  be  aade.  However,  since  we  want  to  find 
(a,  0*f  8*),  the  coaputational  procedure  is  to  first  use  the  MSE 
criterion,  which  is  described  in  the  next  section,  to  find  the  optiaal 
triple  and  then  use  it  in  the  prediction  equation.  The  predicted 
cuaulative  nuaber  of  failures  is  given  by  (7)  for  Method  2.  This 
equation  is  derived  froa  (4) ,  where  P|-X,.i  replaces  X,^,  reflecting  the 
fact  that  X,^  only  accounts  for  failures  in  the  range  s,t.  Failures  in 
the  range  1,8'-1,  which  are  accounted  for  by  X,.|,  aust  be  included  in  (7) . 

P,(s)sfo//»)[l-exp(-/}Ci-s4>l))]<^]^  (7) 
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T1i«  •quation  for  Nothods  l  and  3  la  obtainad  by  satting  a«l,  ualng  tha 
valuaa  of  a*,  obtainad  froa  tha  raapactlva  paraaatar  aatiaation 
Mthoda,  and  aatting  X^i*o. 

Maan  Bcraara  Irror  critarion 

Tha  Naan  Squara  oritarion  for  cuaulativa  failuraa  ia  to  niniaisa  (8) . 

le/p  (l-axp(-p  (i-8+1) ) )  -  (Xi-Vx)  1  *  (81 


t-a+l 


MSB  fa) 


.1 


Tha  rationala  of  MSB  (aaan  aquarad  diffaranca  between  predicted  and 
actual  cuaulativa  failura  counta  X,>X^i  in  tha  range  8,t)  ia  to  ainiaise 
the  aua  of  tha  variance  and  the  aquare  of  the  bias  of  the  predicted 
failura  count  [JEN  68].  However,  aince  a  substantial  aaount  of 
coaputation  could  be  involved  in  coaputing  MSE  for  all  values  of  s,  we 
adopt  a  aodifiad  rule  of  using  the  value  of  s  where  MSE  starts  to 
increase  after  initially  decreasing  froa  s>l;  we  call  this  value  s'  to 
distinguish  it  froa  a*  the  value  of  a  where  MSE  ia  ainiaua.  This  approach 
results  in  leas  coaputation  and  ainiaua  discarding  of  "old”  data.  Our 
experience  Indicates  that  s'  will  provide  accurate  predictions  and  auch 
better  ones  than  s-1  in  those  cases  where  s>l  is  not  optiaal. 
Furtheraore,  we  recognise  that  there  ia  no  assurance  that  s*  coaputed  in 
the  paraaeter  estimation  range  will  necessarily  result  in  the  ainiaua 
MSE  or  Boat  accurate  prediction  in  the  prediction  range.  In  fact,  as 
will  be  seen,  in  none  cases  our  heuristic  produces  better  predictions 
than  that  obtained  with  s*.  The  MSE  for  Methods  1  and  3  is  obtained  by 
aaking  the  adjustaents  described  in  the  previous  section.  Equation  (8) 
is  plotted  in  Figure  1  for  Module  1  of  the  Space  Shuttle  software  for 
both  the  paraaeter  estimation  range  and  the  prediction  range.  To  obtain 
the  latter,  we  aodify  (8)  to  use  suaaation  llaits  of  t-fl  to  T  and  to  use 
a  denoainator  of  T>t,  where  T  is  the  upper  liait  of  the  prediction 
range.  This  figure  shows  s'->4  and  s**ll  in  both  curves. 

In  order  to  provide  a  aeasure  of  prediction  accuracy  that  is 
independent  of  the  MSE  criterion,  we  coapute  the  aean  relative  error 
(MRE)  for  the  prediction  range,  which  is  given  by  (9)  [KHO] 

NREsa:,[lX,-»il/X|)/(T-t).  (9) 

This  result  is  shown  in  Figure  2,  where  MRE  and  MSE  (repeated  froa 
Figure  1)  are  plotted  for  Module  1.  For  MRE,  we  again  have  s'>4  and 
s*>ll.  Now,  we  coapare  Fi(4)  with  Ft(l)  in  Figure  3  and  see  that  s'>4 
provides  a  better  prediction  than  s«l,  with  the  latter  showing  too  auch 
overshoot. 

These  procedures  are  repeated  for  Module  2  in  Figures  4,  5  and  6  and 
for  Module  3  in  Figures  7,  8  and  9.  The  prediction  curves  in  Figures  3, 
6  and  9,  which  all  show  better  prediction  accuracy  for  s'  as  coapared  to 
s«l  (s«2  is  used  for  coaparison  for  Module  2  because  estiaates  of  a  and 
fi  could  not  be  obtained  for  this  nodule) ,  dranatize  the  iaportance  of 
using  data  aging,  where  appropriate.  The  analysis  of  the  starting 
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interval  is  summarized  in  Table  2.  Ifhen  s'  is  obtained  from  MSE  for  the 
peurameter  estimation  range,  it  produces  the  "best"  s  for  Module  2 
(MRE(6)  differs  from  MRE(7)  by  only  .03)  and  Module  3,  as  determined  by 
the  MSE  and  MRE  for  the  prediction  range  and  provides  better  predictions 
than  s«l  for  all  three  modules.  As  the  execution  of  the  software 
continues  for  T>30,  the  described  procedures  would  be  repeated  with  t«30 

(i.e.,  new  upper  limit  of  parameter  estimation  range). 

/ 

Table  2 

Analysis  of  starting  Interval 


Module 

KSEs  Parameter 
Bstimatioa  Range 

MSEt  Prediction 
Range 

MREt  Prediction 
Range 

1 

8'»4,  S*-ll 

8'-4,  8-11 

s'«4,  s*-ll 

2 

s'-?,  8*-? 

s'-S ,  8*»6 

s'-e ,  s*-6 

3 

8'-4 ,  8*-10 

8'-4,  s*-4 

s'“4 ,  s*— 4 

SUMMARY.  C0NCL08I0NB  AND  FPTDRE  RESEARCH 

We  found  that  MSE  does  a  good  job  of  Identifying  s';  it  has  no 
dependence  on  model  assiunptions  and  it  minimizes  the  sum  of  the  variance 
and  the  square  of  the  bias  of  the  predicted  failure  count.  We  noted  that 
once  MSE  reaches  a  minimum,  as  a  function  of  s,  and  starts  to  increase, 
the  computation  can  be  terminated  at  that  point  because  s’  provides  a 
good  (better  than  s«l)  prediction,  although  not  necessarily  the  best 
prediction.  Since  the  future  failure  process  may  not  mirror  the  past,  no 
criterion  can  produce  the  best  prediction  in  all  cases.  What  we  can 
accomplish  is  to  produce  better  predictions  than  would  be  the  case  in 
using  all  the  data.  This  we  have  demonstrated  with  the  examples.  Since 
the  other  Space  Shuttle  modules  have  failure  count  distributions  over 
execution  time  that  are  similar  to  the  ones  analyzed,  we  believe  data 
aging  is  applicable  in  general  to  the  Space  Shuttle  software.  Our 
results  suggest  that  other  software  reliability  models  could  benefit 
from  using  data  aging. 

The  next  stage  of  our  research  will  involve  the  use  of  Jet  Propulsion 
Laboratory  planetary  mission  data  and  Shuttle  mission  ground  control 
data  from  the  Johnson  Space  Center  to  determine  whether  data  aging  is 
applicable  to  different  environments.  In  addition  we  will  analyze  the 
MSE  criterion  relative  to  the  use  of  Method  3  and  we  will  report  our 
results  in  obtaining  improved  time  to  next  failure  predictions  by  using 
data  aging. 

DISCLAIMER 

The  analysis  of  experimental  results  of  the  intermediate  software 
failure  data  in  this  paper  should  not  be  construed  as  a  prediction  of 
the  final  Space  Shuttle  software  reliability.  Rather,  the  Space  Shuttle 
data  is  used  as  real  project  examples  for  the  purpose  of  developing, 
enhancing  and  validating  software  reliability  models. 
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Mean  Square  Error 


Mean  Square  Error:  Parameter  Eetimation 
Range (1 -20 >  and  Prediction  Range (21 -30) 


Figure  1  •••  s  (Starting  Interuai) 

Method  2*  Module  1 
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Msan  Squara  Error  (MSE>  &  Mean  Ralat.iua 
Error  (MRE)  in  Pradiction  Range  21-30 


Figure 2  •••  s  (Starting  Interual) 

Method  2*  Module  1* 
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Cumulatiue  Failuras 


Prafjictad  &  Actual  Cumulatiua  Falluraa, 
ss4  (MSE  Critarion) »  8=1 


Figures  •••  Execution  Time  (Interuals) 
Method  2*  Module  1* 


151 


Mean  Square  Error 


1 

Mean  Square  Error:  Parameter  Estimation 
Range (1-20)  and  Prediction  Range (21-30) 


Figure  4  s  (Starting  Interual) 

Method  2.  Module  2. 
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Mean  Squara  Error  (MSE)  &  Maan  Ralatiui 
Error  (MRE>  in  Pradiotion  Ranga  21-30 


Figures  •••  s  (Startimg  Interual) 
Method  2*  Module  2* 
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Cumulatiue  Failures 


Predicted  &  Actual  Cumulatiue  Failures^ 
b=7  (MSE  Criterion)^  8=2 


Figure  6  •••  Execution  Time  (Interuais) 

Method  2*  Module  2* 
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Mean  Square  Error:  Parameter  Estimation 
Range (1-20)  and  Prediction  Range (21-30) 


Figure  ^  •••  s  (Starting  Interual) 

Method  2*  Module  3* 


I 


Mmh  Square  Error  (MSE)  &  Mean  Ralatiu 
Error  (MRE)  in  Pradiction  Range  21-30 


Figures  •••  s  (Starting  Interual) 
Method  2«  Module  3. 


Cumulatius  Failuras 


Pradiotad  &  Actual  Cumuiatiua  Failuraa^ 
as4  (USE  Critarion>>  a=l 


Figura  9  •••  Execution  Time  (Interuals) 
Method  2*  Module  3* 
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Dcdgn  Structuring  for  System  Engineering 
1.  Overview  of  Design  Structuring 

The  Design  Strocturing  and  Allocation  Optimizadoo  (DESTINATION)  methodology  provides  a 
systems  engineer  with  a  mechanism  for  making  design  decisions  based  on  design  (^>timizati(m  and 
trade-off  a^ySis.  The  methottology  can  be  used  as  a  front-end  methodology  for  building  huge, 
complex,  real-dme  systems. 

Most  existing  front-end  methodologies  provide  a  mechanism  for  specifying  system  design  but  lack 
a  method  of  helping  the  systems  engineer  in  determining  a  "good"  system  design.  The  DESTINA¬ 
TION  methodology  provides  tools  of  Design  Structuring.  Resource  Allocation.  Design  Evahiatkm 
and  Optimization.  Figure  1  shows  an  oiganizatitti  of  the  DESTINATION  methodology. 


Figure  1:  Die  DESTINATION  Methodology. 

Tlds  paper  describes  the  Design  Structuring  component  of  DESTINATION.  The  system  design 
specification  and  the  system  requirements  of  software,  hardware  and  human  interfaces  of  a  system  to 
be  built  are  inputs  for  the  Design  Structuring  component  The  system  design  voll  be  optimized 
through  an  iterative  process  of  Design  Structuring  and  trade-off  analysis  in  conjunction  with  Re¬ 
source  Allocation,  Design  Evaluation  and  Optimizarimt 
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This  paper  is  ofxanized  as  follows:  Section  2  explains  tenninology  used  in  this  paper.  The  objectives 
of  Design  Structuring  are  discussed  in  Section  3.  The  Design  Structuring  techniques  are  ptesenied  in 
Section  4.  An  example  of  Design  Structuring  is  given  in  Section  5.  This  piper  is  summarized  and 
ctmcluded  in  Section  6. 

2.  Terminology 

j 

2.1.  Design  Element 

A  system  design  can  be  represented  by  a  graph.  The  nodes  and  the  edges  of  the  graph  have  attributes 
which  express  more  detailed  system  design  information.  A  design  element  is  a  portion  of  such  a 
gnphical  representation  of  a  system  design.  A  design  element  ctmsists  of  a  single  or  multi(^ 
collection  of  nodes  and/or  edges. 

22.  Design  Structuring 

Design  Structuring  is  an  engineering  activity  to  construct  gnphical  representations  of  the  system 
design  (including  hardware,  software  and  people).  The  goal  of  this  activity  is  to  produce  a  design  that 
satisfies  requirements  in  an  optimal,  or  near  optimal,  manner.  From  a  functional  perspective.  De¬ 
sign  Structuring  starts  with  system  requirements,  optimization  criteria,  and  possibly  an  existing  de¬ 
sign  and  produces  a  new  systems  derign.  There  are  five  basic  design  structuring  operations — De¬ 
composition  and  Recomposition,  Fragmentation  and  Defragmentation  and  Replacement-— which 
are  defined  below. 

23.  Decomposition 

Decomporition  generates  additional  design  constructs  at  a  mote  detailed  level  in  the  graphical  de¬ 
sign  representation  hierarchy  Gower  level  of  abstraction).  Figure  2  illustrates  that  the  details  of  a 
design  element  can  be  viewed  via  Decomposition.  Node  A  and  edges  x,  y  and  z  of  the  data  flow  dia¬ 
gram  is  decomposed  into  nodes  (Al,  A2,  A3  and  A4)  and  the  corresponding  edges  (xl,  x2,  y  and  z) 
connecting  them. 

2.4.  Recmnposition 

Recompodtion  aggregates  a  design  element  of  the  hierarchy  which  consists  of  multiple  nodes  and 
connections.  Thus  Recomposition  is  an  inverse  operation  of  Decomposition.  In  Hgure  2,  nodes,  Al, 
A2,  A3  and  A4,  and  their  edges  (xl,  x2,  y  and  z)  ate  merged  into  a  node.  A,  and  edges,  x,  y  and  z,  via 
Recomposition. 


160 


Figure  2:  Decomposltioii/Rcconipadtion  Example— Data  Flow  Diagram. 


23.  Fragmentation 

Fragmentation  replaces  a  design  element  with  a  more  complicated  deagn  element  within  a  given 
level  of  the  graphical  design  representation  hierarchy  (same  level  of  abstraction).  Replication  of  the 
same  design  element  is  a  special  case  of  Fn^entation.  As  shown  Figure  3,  a  node  of  a  data  flow 
diagram.  A,  and  edges,  x  and  y,  can  be  replaced  by  asetof  nodes,  Al,  A2,  A3  and  A4,  and  edges  via 
Fragmentation. 

2.6.  Delhigmentation 

♦ 

Defragmentation  replaces  a  design  element  which  consists  of  multiple  nodes  and  edges  with  a  sim¬ 
pler  design  element  which  has  less  number  of  nodes  and  edges.  In  Figure  2,  a  design  elements  with 
multiple  nodes,  Al,  A2,  A3  and  A4,  and  their  edges,  x  and  y,  are  merged  into  a  simpler  design  ele¬ 
ment  with  a  single  node.  A,  and  edges,  x  and  y,  via  Defragmentation.  Thus  Defragmentation  is  an 
inverse  operation  of  Fragmentation. 
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2.7.  Replacement 

Replacement  refers  to  a  design  structuring  operation  whereby  a  design  element  (either  within  the 
same  or  a  different  level  of  abstraction)  is  exchanged  with  another  design  element  Replacement 
does  not  include  either  E)ecomposition/Recomposition  or  Rragmentadon/Defragmentation.  The  in¬ 
terface  Gi.e.,  connections)  to  the  replaced  design  element  remains  the  same.  There  is  not  necessarily  a 
one-to-one  mapping  of  nodes.  Hgure  4  shows  an  example  of  Replacing  a  design  element  in  a  graph 
which  corre^nds  to  three  decision  points.  The  corre^nding  metrics  such  as  McCabe,  Myer's  Ex¬ 
tension.  etc.  shown  at  the  bottom.  Figure  4  illustrates  why  one  design  structuring  may  be  preferable 
[HMKD82]. 
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Figure  4:  Replaconent  Example. 


3.  Objectives 

There  can  be  three  general  motivations  behind  why  a  systems  engineer  is  interested  in  design  stnic* 
tilling. 

3.1.  Facilitate  Mapping  across  Design  Capture  Models 

The  qrstem  engineer  may  want  to  structure  a  design  in  such  a  way  so  as  to  make  it  easier,  more  intu¬ 
itive,  or  for  facilitating  optimization  when  moping  lopcal  models  onto  implementation  models  and 
vice  versa.  This  also  applies  to  the  mi^ping  of  resources  within  the  implementation  model,  particu¬ 
larly  mapping  software  onto  hardware. 
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3^  Optiinixe  System  Desl^  Factors  (SDF) 


The  system  engineer  would  like  to  synthesize  and  trade-off  multiple  designs  that  optimize  for  single 
criteria  and  eventually  for  multiple  criteria  (h*  the  values  of  system  design  factors  (Le.,  ntm-function- 
al  requirements).  The  list  below  contains  examples  of  three  particular  SDFs  that  are  of  particular 
interesL 

; 

(1) .  Performance 

Performance  can  be  improved  by  replicating  (a  special  case  of  Fragmenta¬ 
tion)  design  units  to  increase  parallelism  or  by  defragmenting  to  decrease 
communication  overhead.  Defragmentation  and  replication  represent  de¬ 
sign  structuring  techniques. 

(2) .  Dependability 

Similar  to  Performance,  Dependability  may  be  improved  through  Replica¬ 
tion  or  Defragmentation. 

(3) .  Maintainability 

Improving  system  maintainability  is  strongly  influenced  by  understand- 
ability.  which  is  similar  to  the  first  objective.  One  example  of  design 
structuring  which  impacts  maintainability  would  be  to  analyze  a  graph 
which  depicts  the  relationships  of  globd  variables  and  restructures  it  to 
minimize  module  coupling. 

33.  Improve  Design  Understandabllity 

The  most  common  way  to  measure  understandability  has  been  through  various  complexity  metrics. 
Examples  include  size,  McCabe,  scope,  etc. 

4.  Technique 

Deagn  Structuring  techniques  (be  they  heuristics  or  algorithms)  arise  from  one  of  three  areas:  the 
application  domain,  a  design  methodology,  or  graph  theory.  Each  of  these  is  briefly  mentioned  be¬ 
low. 

4.1.  Application  Domain 

There  are  many  domain-specific  factors  in  system  design.  For  example.  Fragmentation  is  useful  in 
improving  parallelism  in  a  massively  parallel  processing  environment  On  the  other  hand.  Frag¬ 
mentation  can  make  a  diagram  complicated  if  the  diagram  is  to  be  displayed  in  a  gr^hically  based 
system.  (Certainly  in  Navy  applications,  such  as  sonar  processing,  there  are  specific  devices  and  func¬ 
tions  that  are  specific  to  a  family  of  systems.  Experts  in  a  particular  domain  have  developed  design 
structuring  principles  over  time  that  can  be  captured  and  applied. 
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4^  Design  Methodologies 

Heuristics  depend  on  a  design  methodology  of  a  system  such  as  Object-Oriented  Design,  Structured 
Design,  Task  Structuring,  Partitioning,  etc.  For  example,  the  Task  Structuring  technique  of 
ADARTS  (Ada  Based  Design  Approach  for  real  lime  Systems)  [ADRT88]  can  be  used. 

/ 

43.  Graph  Theory 

Complexly  of  a  design  can  be  measured  as  a  number  of  nodes  and  connections  of  a  diagram.  For 
instance,  it  is  desired  to  have  seven  plus  or  minus  two  bubbles  in  any  individual  diagram  to  facilitate 
human  comprehension.  However,  this  kind  of  simplistic  guideline  may  easily  be  broken.  Other 
gnq)h  theories,  involve  the  number  of  decision  points,  crossing  edges,  etc. 

5.  Example 

Hguie  5  graphically  represents  doriain  of  Design  Structuring.  A  system  engineer  performs  Design 
Structuring  according  to  objective,  operation,  technique,  system  perspective,  and  design  capture 
view. 
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This  section  presents  a  sample  Design  Stnicturing  technique  based  on  ADARTS,  a  system  design 
methodology  for  structuring  a  system  into  concurrent  tasks  (or  active  objects).  The  performance  of 
the  system  can  be  increased  through  the  task  structuring  process.  In  order  to  develop  more  maintain¬ 
able  and  reusable  components,  ADARTS  provides  criteria  of  identifying  modules  (or  passive  ob¬ 
jects).  The  ADARTS  methodology  is  supported  by  many  companies  such  as  Software  Productivity 
Consortium  and  Cadre.  The  DESITNATION  methodology  can  utilize  the  tools  from  these  compa¬ 
nies.  It  can  use  ADARTS  for  system  design  and  optimize  the  system  design  through  iteration  of  soft¬ 
ware  design  with  hardware  decisions  and  organizational  lessons  with  respect  to  System  Design  Fac¬ 
tors  (SDF)  [SDF92]. 


5.1.  ADARTS 

Figure  6  illustrates  steps  of  the  ADARTS  methodology.  The  first  step  of  ADARTS  is  to  express  a 
system  design  in  Real  Time  Structured  Analysis  (RTSA).  Then  concurrent  tasks  (in  Dynamic  \^ew 
or  Task  Architecture  Diagram)  and  modules  (in  Static  View  or  System  Architecture  Diagram)  are 
identified.  A  system  architecture  is  derived  from  those  diagrams. 


Figure  6:  Steps  in  the  ADARTS  Methodology. 
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5^  Cruise  Control  System  Example 

A  cruise  control  system  of  an  automobile  is  used  to  demonstrate  the  Design  Structuring  method 
based  on  AD  ARTS.  An  overall  data  flow  diagram  of  a  cruise  control  system  is  given  in  Hgure  7. 
The  ADARTS  task  structuring  criteria  are  applied  to  the  data  flow  diagram  of  the  cruise  control  sys¬ 
tem  design.  A  system  engineer  obtains  a  task  architecture  diagram  as  shown  in  Figure  8  from  the  data 
flow  diagram. 

The  criteria  of  object-oriented  design  are  used  for  information  hiding  of  hardware,  software  and 
people  of  the  system.  A  number  of  design  stracoiring  operations  are  applied  to  the  task  architecture 
diagram  in  Figure  8.  For  example,  a  task,  "Monitor  Auto  Sensors",  in  Figure  8  is  "decomposed".  The 
same  task  in  Figure  9  contains  three  objects  denoting  sensing  devices.  Another  task,  "Auto  Speed 
Control",  in  Hgure  8  is  "recomposed"  to  become  an  object,  "Auto  Speed  Control",  in  Figure  9.  The 
message  queue,  "Sensor  Change",  in  Figure  8  is  "replaced"  by  a  new  object,  "Cruise  Control  Event 
Buffer".  The  result  is  a  system  architecture  diagram  shown  in  Hgure  9. 


Figure  7.  Overall  Data  Flow  Diagram — Cruise  Control  System. 
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Figure  8.  Tuk  Architecture  Diagram — Cruise  Control  System. 
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6.  Conclusion 


A  research  effort  of  developing  a  Design  Structuring  method  has  been  presented  in  this  paper.  The 
goal  is  to  produce  optimal  or  near  optimal  system  design  that  satisfies  system  requirements.  We  have 
discussed  terminology,  objectives,  technique,  and  an  example  of  Design  Structuring. 

Currently,  we  investigate  a  System  Design  Structuring  methodology  based  on  the  ADARTS  ^ 
proach.  The  methodology  involves  system  engineering  through  iteration  of  software  design  with 
hardware  decisions  and  organizational  lessons  with  respect  to  SDFs.  A  set  of  Design  Structuring 
criteria  according  to  SDFs  (System  Design  Factors)  is  researched.  A  method  of  optimizing  a  system 
design  with  respect  to  SDFs  is  studied. 

A  prototype  of  DESHNAnON  will  be  constructed  for  demonstration.  The  Design  Structuring 
component  of  the  prototype  will  utilize  the  Design  Structuring  criteria,  the  optimization  method  and 
the  CASE  tools  supporting  ADARTS. 
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Abstract 

A  platform  for  complex  real-time  applications  is  presented.  The  platform  consists  of  a  number 
of  on-  and  off-line  components:  a  RISC  architecture,  a  runtime  kernel,  a  generic  interconnected 
network,  a  specification  and  design  language,  a  programming  language,  compiler,  linker,  schedula- 
bility  analyzer,  assignment  tool,  and  a  graphical  user  interface.  All  components  adhere  to  stringent 
requirements  of  predictable  real-time  computation.  The  integrated  platform  has  been  prototyped 
and  a  successful  initial  evaluation  has  taken  place.  We  are  now  extending  the  platform  to  accom¬ 
modate  the  requirements  of  two  collaborative  Navy  projects  —  in  task  allocation  and  optimization, 
and  in  component  re-engineering. 


1  Introduction 

The  emerging  generation  of  complex  real-time  applications  requires  availability  of  application  devel¬ 
opment  and  execution  platforms  which  integrate  a  number  of  traditional  and  novel  components.  Els- 
sentially,  a  suitable  platform  needs  to  accommodate  the  inherent  complexity,  distribution,  parallelism, 
adaptibility  and  non-functional  requirements  (such  as  time  criticalness,  fault  tolerance,  dependability 
and  security)  —  characteristics  not  necessarily  found  in  older  real-time  applications  —  of  these  new 
applications.  In  this  paper,  we  report  briefly  on  such  a  platform,  that  we  have  designed  and  prototyped 
at  New  Jersey  Institute  of  Technology’s  Real-Time  Computing  Laboratory. 

We  recognize  the  following  steps  —  and  thus  the  corresponding  need  for  a  platform  component  —  in 
the  development  and  execution  lifetime  of  a  complex  real-time  application.  Initially,  an  application  is 
specifled  and  then  designed,  using  a  visual  specification  and  design  language.  Next,  the  application  is 
constructed  and/or  synthesized  from  pre-existing,  reusable  components.  The  pre-existing  components 
are  selected  on  the  basis  of  their  interfaces.  These  components  are  manned  through  the  Component 
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Manager,  not  discussed  here  but  presented  in  [4].  New  components  are  developed  through  translation 
of  the  specification  and  design  language  as  well  as  naturally  through  programming  in  a  high-level  lan¬ 
guage.  The  resulting  application  program  is  consequently  compiled  piece- by- piece  down  to  executable 
form,  and  then  linked.  During  compilation,  essential  timing  and  other  non-functional  information 
is  extracted  (using  the  front-end  function  of  a  schedulability  analyzer).  A  Directed  Acyclic  Graph 
(DAG)  of  software  application  components  —  processes  and  objects  —  is  constructed,  for  display  and 
further  analysis.  The  DAG  is  consequently  analyzed  by  the  assignment  analysis  tool,  and  the  software 
components  are  assigned  to  processing  elements  (PEs)  of  the  generic  interconnected  network.  Next, 
the  components  are  loaded  and  the  application  commences  execution.  Specifically,  each  PE  runs  a 
copy  of  a  runtime  kernel,  which  executes  instructions  of  the  software  components  and  initiates  com¬ 
munication.  The  network  (with  its  own  kernel)  processes  communication  messages.  The  application 
and  the  platform  —  or  more  specifically,  the  DAG  and  the  interconnected  network  of  P£^  —  are 
displayed  graphically  in  windows.  In  z^ldition,  the  performance  of  the  application  and  the  platform 
is  monitored  and  displayed.  Finally,  there  is  a  command  window  which  works  in  conjunction  with 
other  windows,  and  allows  the  user  to  re-assign  components,  change  performance  monitor  attributes, 
provide  application  data  and  so  forth. 

In  what  follows,  we  outline  each  application  platform  component  and  summarize  relevant  current 
and  future  activities. 

2  Specification  and  Design  Language 

To  present  multiple  functional  and  non-functional  views  of  complex  real-time  applications,  we  have 
designed  a  rigorous  specification  and  design  description  language  RT-Chart.  RT-Chart  specifications 
representation  a  system  as  a  set  of  real-time  processes,  each  composed  of  a  set  of  actions.  The  first 
action  of  a  process  is  performed  at  the  start  of  each  process  acti\-ation.  Upon  completion,  the  initial 
action  invokes  a  successor  action,  passing  some  data.  Each  action  requires  a  set  of  resources  (hardware 
or  software),  which  are  claimed  upon  commencement  of  the  action  and  are  released  upon  completion 
of  the  action.  RT-Chart  provides  a  resource  algebra  for  stating  the  modes  in  which  resources  can  be 
used:  (1)  may  be  used  concurrently  (2)  must  be  used  concurrently  (3)  cannot  be  used  concurrently  and 
(4)  cannot  be  used  concurrently  and  must  be  used  in  a  particular  order.  Additionally,  and-gates  allow 
the  specification  of  parallel  actions,  and  or-gates  indicate  conditional  execution  of  one  among  a  set  of 
actions.  Furthermore,  in  RT-Chart,  the  actions  may  be  hierarchical,  to  enable  macro-level  reasoning 
and  specification.  In  addition  to  functionality,  timing  and  parallelism,  there  are  other  important 
system  aspects  that  RT-Chart  allows  to  be  expressed.  Security  classification  levels  may  be  indicated 
for  information  flows,  code  (actions),  resources,  levels  of  hierarchy,  and  implementation  details.  To 
allow  dependability  to  be  dealt  with,  degree  of  redundancy  or  reliability  can  be  specified  for  actions 
or  processes.  Another  aspect  of  systems  is  relative  criticality,  which  can  also  be  expressed  for  actions 
and  processes. 

Currently,  RT-CHART  is  implemented  only  partially.  There  is  a  visual,  graphical  interactive  user 
interface,  to  create  and  modify  RT-CHART  specifications.  However,  the  translator  from  RT-CHART 
to  RTX  or  call-DAGs  (see  Sections  3  and  4)  has  not  yet  been  completed. 

3  High-Level  Language 

A  predictable  real-time  language  RTX  has  been  designed,  based  on  the  real-time  model  of  Real-Time 
Euclid  [7,  3]  applied  to  the  ADT/ADO  model  of  RESOLVE  [5,  14].  The  resulting  language  features 
abstract  data  objects,  real-time  processes  and  time-bounded  constructs.  No  arbitrarily-long  compu¬ 
tation  is  allowed.  Thus,  recursion  is  forbidden,  loops  are  unrollable,  and  dynamic  data  operations  are 
not  allowed. 
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One  interesting  feature  of  the  RTX  —  which  makes  accurate  timing  predictions  even  more  chal¬ 
lenging  than  in  other  real-time  languages  —  is  that  a  call  in  RTX  may  proceed  in  parallel  with  the 
caller  that  made  the  call  until  either  a  control  or  data  dependency  forces  synchronization  or  the  call 
returns.  Moreover,  since  RTX  inherits  RESOLVE’s  strict  object-orientation,  every  statement  in  an 
RTX  program  is  in  fact  either  a  primitive  operation  or  a  call  (thus  contributing  to  the  complexity  of 
timing  analysis). 


4  Compiler,  Schedulability  Analyzer,  DAG  Generator  and  Re¬ 
lated  Tbols 

A  compQer  for  RTX  has  been  developed.  While  the  compiler  supports  such  interesting  features  as 
generic  ADTs,  the  emphasis  on  this  work  H  on  supporting  real-time  features.  Consequently,  the 
compiler  operates  in  an  integrated  fashion  with  a  front-end  schedulability  analyzer  (see  below).  In 
addition  to  schedulability  information  and  conventional  RISC  instructions  to  be  executed  later  (see 
Section  7),  the  compiler  also  outputs  information  used  by  the  D.A.G  generator  (see  below). 

\  schedulability  analyzer  operates  similarly  to  the  one  for  Real-Time  Euclid  [9,  7,  1].  A  program 
is  analyzed  for  schedulability  in  two  stages.  The  schedulability  analyser  consequently  consists  of  two 
parts:  a  partially  language-dependent  front  end  and  a  language-independent  back  end.  The  front  end 
is  incorporated  into  the  code  emitter,  and  its  task  is  to  extract,  on  the  basis  of  program  structure  and 
the  code  being  generated,  timing  information  and  calling  information  from  each  subprogram,  process 
or  object,  and  to  build  language-independent  program  trees.  The  front  end  of  the  analyser  does  not 
estimate  interprocess  contention.  However,  it  does  compute  the  amount  of  time  individual  statements 
and  subprogram  and  process  bodies  take  to  execute  in  the  absence  of  calls  and  contention.  These 
times,  serving  as  lower  bounds  on  response  times,  are  reported  back  to  the  programmer. 

The  back  end  of  the  schedulability  analyser  is  actually  a  separate,  language-independent  program. 
Its  task  is  to  correlate  all  information  gathered  and  recorded  in  program  trees  by  the  front  end,  and  to 
predict  guaranteed  response  times  for  the  entire  real  time  application.  To  achieve  this  task,  this  part 
of  the  analyser  maps  the  program  trees  onto  an  instance  of  a  real-time  task  model,  and  then  computes 
the  response  time  guarantees. 

The  back  end  of  the  analyzer  employs  a  potentially  exponential  time  technique  —  frame  superim¬ 
position  —  to  estimate  contention  accurately.  To  reduce  the  problem  space  frame  —  in  terms  of  the 
number  of  possible  execution  paths  —  that  frame  superimposition  has  to  consider,  we  employ  program 
transformations  and  resource  contention  [10],  conditional  linking  [11],  and  aperiodic  process  conversion 
to  periodic  (using  common  techniques  such  as  in  [2]).  The  transformer  balances  and  pads  alternate 
execution  paths  in  conditional  statements,  ensuring  that  regardless  of  the  path  taken,  timewise  both 
the  execution  of  the  thread  in  question  and  the  contention  the  thread  generates  is  the  same  as  in  the 
case  of  one  ori^nal,  dominating  path.  The  linker  eliminates  paths  that  are  infeasible,  ^ven  logical 
relationships  among  conditional  test  variables.  Restricted  resource  contention  means  that  arbitrary 
contention  schemes  for  resources  are  reduced  to  a  dominating  small  number  of  schemes,  at  the  expense 
of  some  time  loss.  A  typical  scheme  may  force  multiple  threads  contending  for  the  same  resource  to  be 
released  only  after  all  of  them  have  used  the  resource.  Finally,  aperiodic-to-periodic  process  conversion 
involves  replacing  processes  with  longer  frames  (minimal  activation  separation  periods  [7])  but  other¬ 
wise  arbitrary  times,  with  regular  (such  as  periodic  or  jittery)  processes  with  identical  per-activation 
resource  requirements  but  tighter  frames. 

Daggen  (the  DAG  generating  tool)  translates  RTX  compiler  output  into  a  sequence  of  numbers  used 
by  the  DAG  Browser  of  the  Graphical  User  Interface  to  draw  a  Directed  Acyclic  Graph.  The  Directed 
Acyclic  Graph  (DAG)  extracted  by  Daggen  represents  a  collection  of  processes  and  objects.  DAG 
edges  represent  calls  from  processes  or  operations  of  objects  to  operations  of  (other)  objects.  (Thus, 
we  also  refer  vo  this  graph  as  the  call-DAG).  Since  RTX  disallows  recursion  (operation-  or  object-level). 
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the  graph  is  indeed  acyclic.  Daggen  builds  the  call*D.4G  of  the  application  by  processing  each  object 
declaration  in  a  top-down  recursive  fashion  beginning  with  the  top  level  proc^ses.  Adjacency  lists 
are  constructed,  along  with  parameters,  their  directions  and  other  attributes,  for  each  caU  possibility. 
For  display  purposes,  at  most  a  single  edge  is  represented  for  any  two  possible  nodes. 


5  Assignment  of  Processes  and  Objects 

We  have  developed  and  continue  improving  upon  accurate  and  efficient  performance  prediction  func¬ 
tions  for  real-time  software  components  (processes  and  objects)  that  are  being  assigned  [13,  14,  6]  to 
processing  elements  (PEs)  in  parallel  systems  and  that  allow  RTX-style  asynchronous  operation  c^. 
The  functions  (1)  consider  load- balancing,  parallelism  and  deadlines,  (2)  predict  performance  on  the 
basis  of  projected  service  rates,  and  (3)  treat  PEs  and  communication  links  in  an  integrated  manner. 

We  have  undertaken  a  quantitative  evaluation  which  has  demonstrated  that  the  functions  perform 
quite  well  [12].  The  assignment  tool  is  currently  used  before  e.\ecution,  to  assign  processes  and  objects 
to  the  PEs  of  our  platform  (see  Section  6).  Eventually,  the  tool  will  also  be  used  to  re-assign  processes 
and  objects  dynamically,  as  the  need  arises. 


6  Processing  Elements  and  Networking 

The  platform  architecture  consists  of  processing  elements  (PEs)  interconnected  by  a  generic  network. 
Each  PE  is  controlled  by  a  replica  of  the  PE  kernel  (see  Section  7).  The  PE  architecture  is  currently  a 
RISC  computer  based  on  [14].  While  the  architecture  is  mostly  conventional,  it  includes  features  for  ef¬ 
ficient  loading,  execution  and  cloning  of  objects  and  processes.  Furthermore,  the  architecture  is  mostly 
predictable  in  its  real-timing,  in  the  sense  of  the  architecture  in  [l].  Potentially  unpredictable  features 
of  the  architecture  —  such  as  dynamic  memory  allocation  and  pipelining  —  are  made  predictable 
by  the  compiler  (which  disallows  memory  allocation  past  module  load,  and  swaps/pads  post-branch 
instructions,  for  instance). 

Each  PE  is  equipped  with  communication  ports.  There  is  no  limit  on  the  number  of  PEs  in  *:he 
system,  and  we  have  experimented  with  a  number  of  topologies  (a  ring,  a  bus,  a  mesh,  a  hypercube). 
We  are  building  a  generic  network  topology  constructor  to  enable  arbitrary  interconnection  of  common 
topolo^cal  structures  (e.g.  five  hypercubes  and  three  rings  connected  by  a  bus). 

The  architecture  provides  PE  and  communication  link  resources.  Each  resource  is  assumed  to  be 
schedulable  separately,  for  generality.  Every  resource  has  its  own  queue  for  requests  and  is  free  to 
employ  any  predictable  scheduling  policy. 


7  PE  and  Network  Kernels 

The  operating  system  component  of  the  platform  consists  of  small  kernels  for  PE  and  network  control, 
respectively.  Each  PE  is  equipped  with  a  replica  of  the  PE  kernel,  whose  tasks  include  process 
scheduling,  initiating  and  receiving  messages,  and  gathering  execution  information.  These  functions 
are  done  in  a  standard  fashion,  employing  common  and  straightforward  decisions  such  as  process 
priority  inheritance  and  deadline  detection.  Moreover,  the  PE  kernel  supports  both  synchronous  and 
asynchronous  calls  (see  Section  3)  —  calls  inherit  caller  priority.  Both  preemptive  and  non-preemptlve 
scheduling  are  supported.  The  PE  kernel  allows  the  user  to  select  synchronous  or  asynchronous  call 
mode.  The  PE  kernel  passes  information  on  performance  statistics  and  execution  progress  (as  in  who 
is  calling  whom)  to  the  graphical  user  interface  (Section  8).  Finally,  the  PE  kernel  is  used  to  control 
the  speed  of  program  execution  (user-selectable  —  to  enable  e.xecution  progress  at  human  speed  for 
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observation). 

The  network  kernel  executes  the  network,  including  message  transmission,  link  scheduling,  delay 
updates  (transmission  and  propagation  delays)  and  other  such  functions.  This  kernel  too  interfaces 
with  the  graphical  user  interface.  Since  the  design  of  the  architecture  has  deliberately  kept  the  PEls 
and  the  Interconnection  (and  consequently  their  kernels)  separate,  there  is  a  need  to  maintain  common 
granularity  and  value  of  time.  This  is  currently  achieved  through  a  common  router  for  messages  and 
time  keeping.  Future  implementations  are  likely  to  also  employ  simple  and  reliable  time  management, 
such  as  monitoring  official  time  [l],  rather  than  high-overhead,  theoretical  synchronization  protocols. 


8  Graphical  User  Interface 

We  have  built  a  graphical  user  interface  (GUI)  to  display  the  activities  in  the  system  and  to  accept 
commands  from  the  user.  The  GUI  currently  consists  of  the  command  window,  the  call-DAG  window, 
the  architecture  window  and  the  performance  monitoring  window. 

The  command  window  controls  platform  and  application  e.vecution.  Through  this  window,  the  user 
may  instruct  the  platform  to  load,  e.xecute  or  terminate  and  application,  to  (re-)assign  a  process  or 
an  object  (this  is  done  by  identifying  the  command,  the  object  in  the  call-D.-VG  window  and  then  the 
PE  in  the  architecture  window  —  the  PE  kernel  does  not  currently  support  this  feature),  to  set  the 
program  execution  speed  dynamically,  and  to  open  the  other  windows. 

The  call-D.AG  window  displays  an  application  in  its  call-D.\G  form.  Using  colors,  line  sickness  and 
other  display  features,  the  window  clearly  indicates  e.\ecuting,  idle  or  blocked  nodes,  calls-in-progress 
over  edges,  and  call  and  return  parts  of  calls.  The  window  displays  names  or  numbers  of  nodes. 

The  architecture  window  displays  interconnected  PEs.  .^gain  using  colors,  shapes  and  so  forth, 
the  status  of  each  PE’s  processor  (executing,  idle,  waiting),  its  communication  ports  (idle,  sending, 
receiving,  waiting),  and  the  status  of  communication  links  (transmitting,  blocked,  reserved,  idle)  is 
indicated.  Call  and  return  messages  are  shown.  Finally,  the  user  may  request  to  display  the  portions  of 
the  call-DAG  corresponding  to  the  processes  and  objects  —  optionally  with  their  call-D.A,G  neighbors 
—  to  be  displayed  for  any  PE, 

Finally,  the  performance  window  displays  sets  of  bar  graphs  and  an  index  table,  each  with  timing 
information  on  processes  (and  objects  and  their  operations)  that  run  to  completion.  Specifically,  the 
statistics  displayed  (and  requested  by  the  user  through  this  window)  include  observed,  predicted  (by 
the  assignment  tool  or  the  schedulability  analyzer)  and  historical  response  times,  laxities,  frames  and 
deadlines,  and  the  deviations  among  the  predicted  and  observed  sets  of  times.  Information  on  the 
last  termination  for  every  process  or  object  is  kept  and  displayed  upon  request,  as  is  information  on 
the  last  N  terminations,  for  a  user-specified  N,  The  index  table  maps  process  or  object  numbers  to 
declared  names  and  their  associated  frames  and  deadlines,  if  applicable. 


9  State  of  the  Platform,  Current  and  Future  Activities 

Our  current  prototype  has  been  implemented  and  running  since  Spring’93.  PE  and  network  kernels, 
the  router  and  the  GUI  are  run  as  Unix^  processes  that  communicate  via  sockets.  The  GUI  and 
RT-CHART  windows  run  under  X  (X-view  and  Motif).  The  PE  kernel  uses  shortest-latency-first  with 
inheritance  to  schedule  processes,  and  the  two  kernels  use  FIFO  for  link  and  message  scheduling.  The 
time  granularity  is  kept  consistent  (between  the  PEs  and  the  network)  by  the  router  (that  also  keeps 
the  master  clock). 

We  are  currently  building  a  graphical  DAG  constructor  to  enable  rapid  prototyping  of  complex 
^ Unix  is  a  trademark  of  the  AT&T  Bell  Laboratories 
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applications.  The  constructor  will  iiU  in  cycle-burners  in  place  of  real  instructions,  so  that  the  resulting 
symbolic  application  will  exhibit  the  same  desired  timing  and  other  non-functional  behavior  as  the 
ultimate  real  application.  Now  that  we  have  a  *‘proof-of-concept”  prototype,  we  are  about  to  undertake 
in  a  thorough  re-design  and  re-engineering  of  the  platform.  One  of  our  tasks  in  re-design  is  to  facilitate 
the  use  of  common  languages,  such  as  Ada9X,  in  the  spirit  of  real-time  execution  [8]  and  while  enabling 
re-use,  re-engineering  and  other  desirable  features. 

Finally,  we  are  engaged  in  two  collaborative  projects  with  NSWC:  one  in  task  allocation  and 
optimization,  and  the  other  in  software  component  re-engineering.  The  platform  is  used  in  both 
projects  and  we  have  begun  extending  it  to  meet  the  NSWC  requirements. 

We  are  indebted  to  the  Office  of  Naval  Research  and  the  Naval  Surface  Warfare  Center  (White 
Oak  and  Dahlgren)  for  making  this  project  possible  —  and  especiaUy  for  facilitating  what  has  now 
become  a  productive,  collaborative  effort.  We  would  like  to  thank  all  other  members  of  the  Real-Time 
Computing  Laboratory  —  past  and  present,  regular  and  visitors  —  who  have  contributed  immensely 
to  the  creative  and  professional  environment  within  the  Lab. 
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1  Introduction 

The  trend  towards  collections  of  powerful  processors  connected  by  a  conmiunication  network  has  motivated  the  devel¬ 
opment  of  a  number  of  distributed  operating  systems  [3, 7, 8, 1 1]  and  many  paradigms  for  building  distributed  plications 
II ,  2, 4, 6, 9, 10].  The  shift  from  centralized  large  computer  systems  to  closely-coupled  clusters  of  processor  which  provide 
the  illusion  of  a  single  system  to  the  clients  has  been  accelerated  partially  by  the  introduction  of  inexpensive  micro-processors 
and  dense  memory  chips.  However,  the  complexity  of  software  design,  development,  testing  and  integration  of  disuibuted 
systems  poses  a  significant  challenge  particularly  for  mission-critical  plications.  Strict  timing  constraints  and  availability 
lequironents  often  introduce  additional  complexity  in  the  design  of  mission-critical  distributed  systems. 

This  ppr  introduce  a  testbed  for  prototyping  fault-tolerant  distributed  systems.  The  testbed  is  structured  as  a  set  of 
sendees  layered  between  the  operating  system  kernel  and  the  plication  software.  The  primary  objectives  of  the  testbed 
are:  a)  to  allow  devdopment  of  complex  ntission-ciltical  plications  from  a  collection  of  building  blocks  with  well-defined 
APIs,  and  b)  to  build  the  infrastructure  for  experimenting  with  different  disuibuted  protocols. 

2  Building  Blocks  and  Common  API 

We  nte  the  term  cluster  to  refer  to  a  collection  of  (perhaps)  homogeneous  processors  connected  by  a  communication 
network  that  ftmetions  as  a  server  to  external  clients.  The  term  server  is  used  to  denote  a  collection  of  processes  (on  the 
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cluster  of  processors)  that  provide  the  illusion  of  a  single  system  to  the  clienu.  A  server  may  be  a  command  &  control 
system,  a  real-time  database  machine,  or  an  automated  system  for  manufacturing  and  process  cpntrol.’  Such  clusters  are 
closely-coupled  in  the  sense  that  a  server  is  seen  as  a  single  system  to  the  clients.  The  server  software  running  on  the 
processors  hides  the  distribution  and  rqtlication  of  the  resources  in  the  cluster.  Ihe  client’s  view  is  defined  by  the  server 
interface.  One  can  view  a  closely-coupled  server  as  a  special  case  of  distributed  systems  in  sriiid)  the  collection  of  server 
software  rtinning  on  the  processors  makes  the  distribution  of  resources  transparent  to  the  clients. 

Building  and  managing  a  reliable  closely-coupled  cluster  that  functions  as  a  single  coherent  system  is  complicated  by 
several  facttffs: 

•  Asynchronous  nature  of  communication  (including  random  communication  delay) 

•  Distribution/replication  of  resources  across  the  system 

•  Changes  to  the  set  of  processors  that  comprise  the  cluster  due  to  a  processor  failing,  leaving  or  rejoining  the  system. 
In  nussion-critical  systems,  other  requirements  introduce  additional  complexity; 

•  S^a  timing  constraints  imposed  on  response  time  of  the  system 

•  Dependability  requirements  the  delivery  of  certain  services 

•  Interoperabilityin  an  open  system  environment 

The  primary  objective  of  this  work  is  to  provide  a  testbed  for  experimenting  with  primitives  for  building  dependable 
servers.  This  is  done  by  providing  a  collection  of  building  blocks  with  well-defined  APIs. 

3  Overview  of  Testbed 

The  testbed  consists  of  a  collection  of  protocols  for  managing  replicated  and  distributed  resources  in  a  system.  It  consists 
of  six  software  layers,  each  exporting  a  well-defined  interf^  to  the  other  layers  or  to  applications  that  are  built  on  top  of 
the  testbed.  iMgure  1  illustrates  the  software  layers  in  the  testbed.  Each  software  layer,  referred  to  as  a  service,  supports  one 
or  more  protocols.  A  brief  description  of  each  service  layer  follows; 

•  multicast  communication  service:  provides  a  reliable  datagram  communication  service  for  sending  a  message  to  a 
collection  of  destinations.  This  service  allows  the  exploitation  of  available  communication  protocols  (e.g.,  Netbios 
vs.  UDP)  and  possible  hardware  support  (e.g.,  hardware  broadcast  facility)  on  a  given  system  without  exposing  the 
inqtlementation'to  the  higher  layer  services. 

•  processor  membersh^  and  monitoring  service:  provides  a  consistent  view  of  the  operational  status  of  a  group 
of  processors  in  the  presence  of  processor/process  failures/joins  and  communication  failures.  Three  membership 
protocols  with  varying  degrees  of  consistency  in  the  views  of  the  members  are  supported. 

•  clock  syrwhronization  service:  provides  a  bound  on  the  deviation  between  logical  clocks  on  processors  in  the  presence 
of  hardware  clock  drifts  and  failures. 

•  reliable  ruiming  service:  provides  a  reliable  sovice  m^ing  the  name  of  an  object  to  a  list  of  processors  in  the 
system.  This  layer  supports  multiple  namespaces. 
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•  distributed  cache  service:  provides  shared/exclusive  access  to  remote  objects  with  local  caching.  This  layer  supports 
multiple  coherency  protocols  including  cache  invalidation  and  write-through  policies. 

•  replication  service:  provides  a  mechanism  for  maintaining  multiple  copies  of  objects  in  a  cluster.  This  layer  supports 
several  rq)lication  protocols  with  different  consistency  semantics  for  updating  rq)licas. 

•  diarWuted  synchronization  service:  provides  fault-tolerant  and  scalable  syndtronization  protocols  for  serializing 
access  to  shared/exclusive  resources.  The  distributed  synchronization  service  can  recover  from  the  failure  of  a  lock 
holder/coordinator  and  communication  failures. 

•  service  future  detector,  provides  notification/query  service  for  monitoring  status  changes  of  a  collection  of  subsys¬ 
tems  grouped  together  as  a  server.  A  status  diange  to  a  group  can  occur  because  of  a  subsystem  failure  or  an  update 
to  an  application-defined  status  fidd. 

4  DeUgn  Goals 

The  primary  design  goals  of  the  testbed  are  qrplication  indqrendence,  portability  to  different  operating  systems  and 
scalability  to  a  large  number  of  processors  in  a  clusta. 
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Application  independence^rtability: 

Hie  ciarent  derign  and  iiqilementation  of  tbe  services  in  the  testbed  ensures  indqiendence  the  triplications.  Ibe 
services  are  generic  and  are  common  to  a  range  of  servers  that  can  be  built  on  a  cluster.  Each  software  layer  has  an  qiplication 
programming  interface.  The  implementation  can  be  ported  easily  to  other  UNIX-like  operating  systems.  *■  Since  a  large 
portion  of  the  operating  system  dependencies  are  encapsulated  in  a  communication  layer,  these  services  are  easily  portable 
to  other  workstation  operating  systems  such  as  OS/2  and  Madi.  Tbe  layered  design  of  the  services  allows  experimenting 
with  several  protocols  for  eadi  layer  without  rewriting  a  substantial  part  of  tbe  code.  For  exanqile,  in  the  current  version  of 
tbe  software,  three  processor  group  membership  protocols  are  supported  (as  described  in  [S]).  Tbe  implementation  of  each 
protocol  consists  of  less  than  1000  additional  lines  of  axle  in  C  «diile  the  surrounding  SOOO  lines  in  the  membership  layer 
aieuntoudmd. 

Petfomaace: 

Analyds  and  preliminary  experimental  results  indicate  that  tbe  current  protocols  in  the  testbed  are  scalable  to  clusters 
with  moderate  number  of  nodes.  The  current  prototype  is  running  in  a  cluster  environment  with  32  nodes.  The  layered 
design  enhances  performance  by  taking  advantage  of  the  hardware  architecture  of  tbe  cluster.  For  example,  the  membership 
and  the  replication  protocols  can  take  advantage  of  hardware  multicast  support  if  available  in  a  cluster.  Similarly,  the 
existence  of  a  physical  shared  memory  among  a  set  of  proc^sors  would  allow  a  more  efficient  implementation  of  certain 
protocol's  and  avoid  the  need  for  some  protocols.  With  the  excq>tion  of  the  processtx'  membership  service,  which  is  the 
layer  gluing  all  others  together,  there  is  no  overhead  fa  unused  services.  Since  many  of  tbe  protocols  in  the  testbed  are 
intended  for  managing  distributed  resources  and  maintaining  consistent  copies  of  replicas,  the  scalability  of  these  protocols 
is  an  ioqxirtant  goal.  Our  qtproach  to  meeting  this  goal  is  discussed  next. 

Scalability  of  distributed  protocols: 

Distribution  or  replication  of  resources  (or  objects)  frequently  leads  to  performance  bottlenecks  in  a  loosely-coupled 
cluster.  Local  caching  and  accessing  of  shared  objects  is  used  to  enhance  perfomance,  while  cache  coherency  protocols  are 
used  to  maintain  object  consistency.  Tbe  protocols  are  designed  for  cases  in  v^icb  objects  are  replicated  a  small  number  of 
times,  sriiicb  enhances  availability  without  compromising  performance.  In  particular,  we  combine  the  rq)lication  protocol 
with  a  caching  protocol  conqtletely  transparent  to  the  user  of  the  service.  These  performance-driven  tradeoffs  allow  tbe 
abstraction  of  a  single  reliable  and  coherent  system  to  be  scalable  to  a  large  number  of  processors,  without  a  serious  adverse 
impact  on  availability.  Since  our  design  philosophy  is  to  rqilicate  a  small  (fewer  than  4)  number  of  replicas  for  any  object, 
this  allows  us  to  optimize  the  replication  protocols.  The  obvious  disadvantage  of  a  small  number  of  rq)licas  is  that  tlw 
system  may  become  unavailable  in  tbe  event  of  multiple  simultaneous  failures,  but  such  an  occurrence  is  relatively  rare  in 
l^actice. 

Weaker  consistency  semantics: 

An  important  objective  is  to  provide  several  protocols  to  satisfy  different  consistency  requirements  on  multiple  cqiies  of 
objects.  In  many  cases,  updates  to  a  rqilicatedobjea  must  be  tiomic.  For  exaiiq)le,fflultiplecopies  of  a  system  configuration 
must  be  kq>t  consistent  such  that  dau  integrity  is  ensured  in  tbe  presence  of  failures.  However,  the  consistency  requirements 
of  other  objects,  sudi  as  sensor  data,  are  much  less  stringent,  and  a  recent  version  of  the  object  can  be  considered  to  be  a 
good  tqtproximation  of  the  latest  version  of  the  objea.  This  is  particularly  true  of  dynamic  attributes  of  an  (fttject  udiich 
are  updated  only  at  periodic  intervals.  For  exau^le,  if  the  load  on  a  machine  is  queried,  a  recent  value  may  be  suSicient 
as  opposed  to  the  latest  snapshot.  Better  performance  can  be  M^ieved  when  the  consistency  requirements  of  an  object  can 
be  relaxed.  We  are  now  experimenting  with  tbe  notion  of  periodic  replication  servers  for  maintaining  replicated  objects  in 

^UNK  it  a  Kgitteied  nademaik  of  ATAT  Bell  Labs. 


182 


time-critical  qipUcatioos  witb  strict  timing  constraints. 


5  Coadudlng  Remarks 

Hie  cunent  iiiq)lementation  of  tbis  testbed  bas  been  built  on  a  networic  of  RISC  System^bOOO  processon  ninning  ADC. 
’  Tbe  modularity  of  its  design  allows  portability  to  other  workstttion  OS  witb  rdaiive  ease.  Ibe  cunent  loqtoiientation 
bas  been  built  on  several  communication  transport  layers  including  tbe  UDP  datagram  service,  a  multicast  communication 
service  and  tbe  NetBios  datagram  service.  We  are  also  experimenting  witb  a  transport  layer  on  a  very  fast  pt^t-to-point 
communkation  network. 

Tbe  software  layers  in  tbis  testbed  are  divided  into  eight  distinct  layers  residing  between  the  application  and  tbe  qierating 
system  kernel.  Each  liyerconsisu  of  one  or  more  distributed  protocols.  Furthermore,  an  application  programming  interface 
cleanly  defines  tbe  fiinctions  provided  by  each  layer.  The  layered  approadi  to  structuring  the  software  services  bas  allowed 
us  to  eiqteriment  witb  different  protocols  and  to  suRtort  different  semantics  for  a  service  in  each  layer. 

Thus  far  tbis  work  has  been  focused  mostly  on  asynchronous  distributed  protocol.  Witb  tbe  excq>tion  of  a  synchronized 
clock  service  and  the  periodic  replication  service,  the  protocol  in  the  testbed  provide  no  bard  real-time  guarantees.  We 
are  currently  investigating  a  variation  of  the  proposed  software  arcbiteoure  tailored  as  a  testbed  for  experimenting  in 
time-critical  systems. 
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ABSTRACT 

The  complexity  of  mission-critical  computer  systems  has  grown  rapidly  in  recent  years.  As  a 
result,  it  is  no  longer  possible  to  investigate  different  design  options  without  the  assistance  of 
computer-aided  (CAD)  tods.  In  this  paper,  we  briefly  describe  a  synthesis  system,  called 
SHARMA,  that  is  bring  developed  at  the  University  of  Wisconsin,  Madison.  SHARMA 
(Synthesis  of  Heterogenous  Architecture  for  Real-time  Mission-critical  Applications)  is  a 
set  of  design  tools  for  architectural  synthesis  of  computer  controllers  for  mission-critical 
applications. 


1  Introduction 

A  mission-critical  application  is  comprised  of  several  cooperating  tasks.  Each  task  may 
have  constraints  such  as  precedence,  release  time,  deadline,  resources,  performance,  and 
reliability.  For  instance,  in  a  surface  ship  radar  application  [2],  an  incoming  missile  must 
be  identMed  within  0.2  seconds  of  its  detection.  If  necessary,  intercept  missiles  must  be  en¬ 
gaged  within  5  seconds  after  detection,  and  launched  within  0.5  seconds  within  engagement. 
Failure  to  meet  such  timing  constraints  may  have  catastrophic  consequences. 

One  way  of  meeting  these  constraints  is  to  make  use  of  a  distributed  computing  system, 
which  is  a  set  of  processors/nodes  interconnected  by  means  of  a  communication  network. 
Designing  such  a  system  involves:  (i)  choosing  a  set  of  building  blocks  such  as  processors, 
intercoimects  and  I/O  interfaces,  and  (ii)  mapping  the  tasks  in  the  application  onto  these 
building  blocks.  Not  all  designs  using  these  building  blocks  will  satisfy  the  constraints  of 
the  application.  Identifying  designs  which  satisfy  the  constrrints  is  very  difficult  due  to  the 
large  number  of  alternatives  that  can  be  constructed  using  the  building  blocks.  Computer- 
aided  tools  can  simplify  the  identification  of  feasible  designs  by  faicilitating  the  search  and 
evaluation  of  several  alternatives.  Presented  briow  is  an  outline  of  an  integrated  set  of  tools, 
called  SHARMA,  for  identifying  feasible  designs. 
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Figure  1:  Example  of  a  real-time  application 

2  Overview  of  SHARMA 

SHARMA  is  a  set  of  tools  for  synthesi^g  heterogeneous  computer  systenu  for  real¬ 
time  mission-critical  applications.  It  is  comprised  of  two  main  modules:  a  synthesizer  and 
a  scheduler.  The  synthesizer  is  responsible  for  selecting  a  set  of  modules  from  ^ven  budd¬ 
ing  block  components.  The  building  block  components  may  include  processors  of  different 
speeds,  sensors,  actuators,  intercoimects,  I/O  interfaces,  etc.  The  scheduler  is  responsible 
for  verifying  that  the  constraints  of  the  application  can  be  satisfied  using  the  selected  build¬ 
ing  block  components.  Since  the  components  may  have  different  costs,  the  main  objective 
of  SHARMA  is  to  identify  a  low  cost  architecture  that  can  meet  the  constrmts  of  the 
application. 

The  application  is  spedfied  to  the  tools  as  an  annotated  directed  acyclic  graph.  The 
vertices  of  the  graph  represent  the  tasks  and  the  edges  rq>resent  the  precedence  constraints. 
Each  vertex  is  annotated  with  the  constraints  of  the  corresponding  task  such  as  its  compu¬ 
tational  requirement,  release  time,  deadline,  and  a  set  of  the  required  resources.  Each  edge 
is  annotated  with  the  amount  of  information  exchanged  between  the  corresponding  pair  of 
tasks.  For  example,  consider  the  application  in  Figure  1.  The  aq>plication  is  comprised  of 
three  periodic  jobs  with  periods  30,  90,  and  45,  respectivdy.  Jobs  1  and  3  have  only  one 
task  whereas  Job  2  has  six  tasks.  The  figure  shows  the  precedence  constraints  among  all 
the  tasks  that  must  complete  in  the  time  interval  0  to  90  (i.e.,  the  least  common  multiple 
of  the  three  periods  30,  90,  and  45).  Note  that,  the  task  graph  is  also  annotated  with  a 


Figure  2:  Block  Diagram  of  the  SHARMA  system 


table  which  specifies  the  computation  requirements,  the  release  time,  the  deadlines,  and  a 
list  of  processor  types  on  which  the  tasks  can  execute. 

In  addition  to  the  application,  SHARMA  is  provided  with  a  library  of  resources  and 
building  block  cbmponents  available  for  synthesizing  a  suitable  architecture.  The  cost  and 
performance  is  also  usually  specified  for  each  component.  From  these  inputs,  the  CAD  tools 
in  SHARMA  choose  suitable  components,  design  an  interconnection  between  them,  map  the 
real-time  tasks  onto  the  components,  and  identify  a  schedule  that  meets  the  constraints  of 
the  application. 

To  develop  the  tools  in  SHARMA,  the  synthesis  process  has  been  split  into  four  major 
steps:  module  selection,  interconnect  optimization,  task  scheduling,  and  message  scheduling. 
£a^  of  .he  steps  is  being  implemented  as  a  separate  CAD  tool  which  interacts  with  the 
other  tools  to  synthesize  the  desired  system  (see  Figure  2).  Presented  below  is  a  brief 
description  of  the  approach  used  in  these  CAD  toob. 

2.1  Module  Selection 

The  designer  specifies  the  set  of  resources  required  for  each  task.  For  each  resource, 
the  designer  also  specifies  a  library  of  different  types  of  the  resource.  These  types  differ 
in  performance  and  cost.  For  example,  a  processor  design  library  may  contain  a  20  MIPS 
processor  for  $20,  a  30  MIPS  processor  for  $40,  and  a  40  MIPS  processors  for  $60.  Module 
selection  is  the  process  of  selecting  the  types  and  the  number  of  units  of  each  resource  type 
required  to  satisfy  the  constraints  of  the  application.  The  current  version  of  our  module 
selection  tool  handles  multiple  types  of  one  resource  (e.g.  processor).  Extensions  which  can 
handle  multiple  types  of  more  than  one  resource  are  currently  in  progress. 

Our  module  selection  algorithm  consists  of  two  major  steps.  In  the  first  step,  an  ex¬ 
tension  of  the  lower  bound  analysis  in  [1]  is  used  to  estimate  the  number  of  units  of  each 
resource  type  and  the  number  of  conununication  channds  required  to  satisfy  the  construnts. 
Using  these  estimates,  the  algorithm  constructs  homogeneous  architectures,  one  for  each 
type,  and  then  invokes  the  task  and  message  scheduler  to  determine  whether  the  constraints 
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can  be  satisfied.  Once  feasible  solutions  have  been  identified,  the  second  step  reduces  the 
cost  of  the  architectures  by  transforming  them  into  heterogeneous  systems.  This  is  accom¬ 
plished  by  recursively  replacing  the  higher  cost  units  by  lower  cost  units  until  it  is  no  longer 
possible  to  satisfy  the  constraints  of  the  application.  The  lowest  cost  system  which  satisfies 
all  constraints  chosen  as  the  final  architecture. 

2.2  Interconnect  Optimization 

There  are  many  difierent  types  of  interconnect  components  such  as  buses,  point-to-point 
links,  and  crossbars.  The  choice  of  interconnects  should  be  based  on  the  communication 
patterns  and  the  communication  delay  constraints  of  the  application  which  in  turn  depend 
on  the  allocation  and  schedufing  of  the  tasks  in  the  application.  However,  since  allocation 
and  scheduling  of  tasks  cannot  be  done  without  the  knowledge  of  the  interconnection, 
choosing  the  optimal  set  of  interconnects  for  a  given  set  of  building  block  components  is 
very  difUcult,  if  not  impossible. 

The  current  version  of  our  interconnect  optimization  tools  is  limited  to  selecting  one 
or  more  multiple  access  channels.  A  lower  bound  analysis  [1]  is  first  used  to  estimate  the 
number  of  channels  of  a  ^ven  type  that  is  required  to  meet  the  communication  needs  of 
the  application.  A  recursive  technique  is  then  used  to  lower  the  cost  of  the  interconnects. 
This  technique  is  similar  to  the  one  used  in  the  module  selection  step. 

2.3  Task  Scheduler 

Several  algorithms  have  been  reported  in  the  literature  for  scheduling  tasks  in  real 
time  applications  [3-6].  The  main  problem  with  most  of  these  algorithms  is  that  they  are 
restricted  to  homogeneous  processing  units.  Since  we  are  dealing  with  a  heterogeneous  sys¬ 
tem,  a  new  algorithm  was  developed  for  scheduling  and  allocating  heterogeneous  processing 
units.  Currently,  the  algorithm  assumes  that  the  interconnection  network  is  a  set  of  mul¬ 
tiple  access  channels  where  each  processing  unit  can  access  any  channel.  Other  types  of 
interconnection  networks  will  be  dealt  with  in  future. 

The  module  selection  and  interconnect  optimization  steps  provide  the  task  scheduler  an 
architecture  on  which  the  scheduling  is  to  be  performed.  The  task  scheduler  is  an  iterative 
list  scheduling  algorithm  that  repeatedly  invokes  the  message  scheduler  to  ensure  that  the 
communication  network  can  meet  the  timing  requirements  of  the  synthesized  schedule.  The 
basic  idea  of  our  task  scheduling  algorithm  can  be  explained  as  follows.  Each  task  is  initially 
assigned  to  a  processor  unit  on  which  it  can  execute.  This  assignment  specifies  an  execution 
time  and  the  latest  start  time  for  each  task.  The  assignment  is  then  discarded  amd  a  new 
^signment  and  schedule  is  generated  as  explained  below. 

A  task  is  considered  ready  for  scheduling  when  all  its  predecessors  have  completed  their 
execution.  The  ready  tasks  are  ordered  according  to  their  latest  start  time.  The  ready  task 
with  the  least  latest  start  time  is  first  considered  for  scheduling.  The  task  is  scheduled  on  the 
processor  on  which  it  can  complete  the  earliest.  To  identify  this  earliest  completion  time. 
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an  posnble  assignments  for  the  task  are  considered.  £^h  possible  assignment  generates 
a  different  set  of  predecessor  messages  that  have  to  be  scheduled  on  the  communication 
network.  For  each  possible  assignment,  the  message  scheduler  is  invoked  to  determine  the 
earliest  completion  time  of  aU  the  predecessor  messages  of  the  task  under  consideration.  The 
least  schedulable  time  after  aff  predecessor  messages  have  arrived  plus  the  corresponding 
execution  time  for  the  task  is  the  earliest  completion  time  of  the  task  on  a  given  processor. 
The  Tninimuni  earUest  completion  time  over  all  possible  processors  is  the  time  at  which  the 
task  is  scheduled;  the  task  is  scheduled  on  the  corresponding  processor.  After  a  task  is 
scheduled,  the  ready  list  is  updated  to  possibly  include  the  immediate  successors  of  the  just 
scheduled  task. 

The  algorithm  continues  in  this  fashion  until  all  tasks  have  been  tentatively  scheduled.  If 
some  tasks  do  not  meet  their  deadlines,  then  the  whole  process  is  repeated  after  recomputing 
the  latest  start  time  of  all  tasks  based  on  the  new  processor  assignment  just  generated.  Note 
that,  a  new  assignment  results  in  a  different  execution  time  and  different  communication 
pattern  for  some  tasks.  Consequently,  there  is  a  change  in  the  latest  completion  time  of 
some  tasks,  which  in  turn,  changes  the  priority  order  among  the  tasks  in  the  next  iteration. 
The  algorithm  terminates  either  when  a  feasible  schedule  is  identified  or  when  a  designer 
specified  iteration  limit  is  exceeded. 

2.4  Message  Scheduler 

The  message  scheduler  is  invoked  by  the  task  scheduler  to  determine  the  earliest  com¬ 
pletion  time  of  all  predecessor  messages  of  a  given  task.  The  message  scheduler  orders  the 
predecessor  messages  according  to  their  earliest  start  times.  The  messages  are  then  con¬ 
sidered  for  scheduling  in  the  increasing  order  of  the  earliest  start  times.  Each  message  is 
scheduled  on  the  communication  channel  in  which  it  can  complete  the  earliest.  To  deter¬ 
mine  the  earliest  completion  time,  each  channel  is  individually  considered  and  the  message 
is  scheduled  at  the  earliest  possible  time  on  that  channel.  The  completion  time  on  a  chan¬ 
nel  is  the  earliest  possible  time  plus  the  time  required  for  communication.  The  message  is 
finally  scheduled  on  the  channel  with  the  minimum  earliest  completion  time. 


3  Preliminary  Results 


In  this  section,  we  present  the  results  generated  by  the  tools  in  SHARMA  for  two 
example  real-time  applications.  The  first  example  is  the  one  shown  in  Figure  1  except  that 
only  two  types  of  processors  are  considered.  It  is  also  assumed  all  tasks  can  run  on  either 
type  of  processor.  Processor  of  type  1  has  a  normalized  performance  of  1  computation  per 
time  unit  whereas  processor  type  2  has  a  normalized  performed  of  0.5  computations  per  time 
unit.  The  cost  of  a  processor  of  type  I  is  assumed  to  be  4  and  that  of  a  processor  of  type 
2  is  assumed  to  be  1.  Rtrther,  it  is  assumed  that  the  cost  of  a  communication  link  is  0.5. 
Figure  3(a)  shows  a  homogeneous  architecture  and  the  corresponding  schedule  which  meets 
the  constraints  of  the  application.  In  this  architecture,  there  are  two  processors  of  type  1 
and  no  processor  of  type  2.  The  overall  cost  of  this  architecture  is  8.5.  Figure  3(b)  shows 
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(b)  HeteiogeneouB  udutectnie 


Hgnre  3:  Architectures  synthesized  by  SHARMA  for  the  example  application  in  Figure  1 
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Figure  4:  Architecture  synthesized  by  SHARMA  for  a  large  example  application 


a  heterogeneous  /architecture  and  the  corresponding  schedule  which  meets  the  constraints 
of  the  application.  In  this  architecture,  there  is  one  processor  of  type  1  and  two  processors 
of  type  2.  Note  that,  the  overall  cost  of  this  architecture  is  only  6.5  as  compared  to  8.5  for 
the  homogeneous  architecture. 

The  second  example  presented  here  is  an  application  with  100  tasks.  The  precedence 
relation  between  the  tasks  were  generated  randomly.  The  tasks  with  no  successors  were 
assigned  deadlines  equal  to  the  length  of  thdr  critical  paths  on  the  slowest  processor.  The 
design  library  used  in  the  synthesis  processors  had  three  t}rpes  of  processors  with  normalized 
performance  of  1.0, 1.2,  and  1.5,  and  normalized  cost  of  1.0, 1.3,  and  2.0,  respectively.  The 
synthesized  heterogeneous  system  and  the  corresponding  feasible  schedule  generated  by 
the  tools  in  SHARMA  are  shown  in  Figure  4.  Note  that,  the  synthesis  design  has  three 
processors  of  type  2  and  two  processors  of  type  3.  Also,  three  communication  channels  were 
required  to  meet  the  constraints  of  the  application. 


4  Conclusion 

We  presented  an  overview  of  the  set  of  CAD  tools  being  developed  at  the  University  of 
Wisconsin-Madison  for  synthesizing  heterogeneous  computer  systems  for  real-time  mission- 
critical  applications.  Preliminary  results  obtained  by  these  tools  were  also  presented.  Future 
work  includes  enhancing  these  tools  and  evaluating  thdr  performance  on  other  applications. 
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Abstract 

A  cnidal  requirement  of  many  Navy  systems  is  die  need  to  peifoim  critical  ftmetions  widdn 
specified  real-time  deadlines  during  stress  situations.  Current  system  design  support  medxxkdogies  and 
tools  are  insufficient  for  tfae  oomplexi^  encountered  in  complex  real-time  system  deveiopmenL  Support 
tedmiques  inadequatdy  communicate  requirements  and  do  not  assist  in  die  oom{dex  ta^  of  idendiying 
critical  components  and  problem  areas. 

lUs  pqier  discusses  a  prototype  Intelligent  Real-lime  System  Assessment  Tool  (ExpeR/T)  that 
integrates  requirements  specification,  system  design,  design  analysis,  and  design  evaluation.  It  provides  the 
dient  and  eogineer  widi  an  automated  tool  that  identifies  critical  and  problem  areas  in  a  deagn  and 
suggests  imjxovements.  The  tool  approach  utilizes  an  innovative  combination  of  eiqieft  system  analysis, 
grqjhical  ^stem  dedgn,  and  derign  modeUng/amulatioa  The  project  leverages  of  the  commercial 
ProTEKT**  grqiliical  system  design  diagrammiitg  and  simulation  tooL 

Introduction 

The  ExpeR/T  tool  performs  a  variety  of  functims.  The  functions  of  the  system  are  segmented  into 
the  followirtg  categories: 

□  Reqtdrernem  specification  •  allows  the  apedficatira  of  requirements  through  an  expandable  menu  system. 
Requirement  details  are  specified  in  a  template  that  alhiws  linkage  redesign  elements. 

□  Design  performance  and  problem  analysis  •  is  made  up  of  performance  aiudysis,  critical  component 
identification,  and  problem  analysis.  Performance  analysis  indudes  a  time  criticality  faeux’/strategy. 
Critical  componeitt  identification  indudes  time  focus  fitetor,  critical  path  aiudysis.  critical  state 
analysis.  lYoblem  analysis  consists  of  conditiem  search  and  rule-based  analysis. 

□  Problem  and  critical  component  correction  asmtance  •  provides  assistance  for  problems  and  critical  areas 
identified  by  the  design  p^ormance,  critical  component,  and  problem  analysis  modules.  An  expert  system 
knowledge  base  is  used  to  designate  iqqnopriate  d^gner  assistance. 

□  Graphical  design  and  simulation  -  allows  system  developers  to  use  a  graphical  design  diagram  and 
simulatitm  tool  This  ctmiponent  is  based  tm  ProTEM  advanced  Petri-net-based  commercial  software. 

□  Automated  conversion  of  DFDs  and  ctxnpatible  design  diagrams  to  ProTEM  Petri  nets  -  gives  compatibility 
with  complementary  system  design  meth^iogies. 

□  Design  quality  evaluation  -  provides  metrics  for  die  evaluation  of  system  designs.  The  analysis  techniques 
include  behavior  analysis,  resource  loading,  and  weigfated/crnnbinatorial  factors. 

APPROACH 

Current  oomidex  real-time  systems  devdopment  methods,  and  the  computer-aided  design  tools 
supportiiig  titer?  idy  heavily  on  an  infomud  and  inaccurate  process  for  communicating  requirements 
amoQg  tile  dient,  developing  contracts,  user,  and  contracting  authority.  This  has  resulted  in  systems 
whose  performance  is  inconsistent  witii  requiiements  and  which  require  expensive  revisions  late  in  tiie 
devdopment  life  cyde. 

Researchers  that  have  noted  the  ina^quacy  of  currem  tools  for  complex  system  devdopment  have 
suggested  second  generation  metaCASE  tools  that  capture  requiiements,  rimulate  system  designs,  and 
advise  dedgneis  on-line  [Forte  92].  The  ImeDigent  Real-Time  System  Assessment  Tool  (ExpeR/T) 
described  here  addresses  tiwse  needs.  The  tool  is  designed  to  address  ^factors  that  contribute  to  comitiex 
tystem  development  problems.  It  also  has  the  flexibOity  to  address  new  problem  areas  as  they  are 
encountered.  ExpeR/T  provides  an  integrated  means  of  defining,  derigning  and  analyzing  complex  real¬ 
time  systems  which  inocnpoiaies  fonnal  metiiods,  informal  methods,  evaluation  schemes,  and  engineering 
etqieiienoe.  But  more  importantly,  it  provides  tiie  client  and  engineer  witii  suggested  improvement  for 
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ddectingprobtemswidi  a  system  model  One  of  E3q)eR/rs  primaiy  objectives  is  to  aid  in  the  identification 
of  critical  areas  eariy  in  die  development  life  cycle  to  pcoWde  a  strategic  advantage  to  both  client  and 
devdoper. 

ExpeR/Ts  lai]^r  ciqiaUlity  set  is  the  result  of  an  innovative  ccmlnnation  of  methods,  tedmkpies 
and  concqits  vridch  are  cuirendy  treated  in  the  technical  literature  as  being  disjoim  and  sqMiate.  Pm 
exampte,  software  structured  analysis  duougfi  dau  flow  diagiams  (DFDs)  and  compadbie  rqaesentaiians 
encapsulated  in  CASE  tools,  systems  analysis  using  a  variety  of  formalized  tedmiques,  and  requirements 
spedGcation  are  dl  treated  as  disjoint  conoems.  ExpeR/T  combines  diese  ooncqxs  through  irttegrated 
Eoglisb-like  requirement  qiecification,  gnqducal  system  design,  derign  mialysis,  and  design  modelitig/ 
simulation.  It  enables  en^ieets  to  realize  thdr  fidl  potential  by  assisting  them  in  focusing  their  atteruion 
on  fliose  aspects  of  system  definition,  analysis  and  design  for  wlndi  human  bongs  are  most  wdl  equipped, 
that  is,  the  solution  of  pooriy  structured,  poorly  understood  proUems.  It  provides  an  objective  means  of 
evaluating  these  solutions  by  identifying  arid  quantifying  criti^  areas,  as  well  as  suggesting  improvements. 

Reqidrements  Specification 

The  requirements  component  allows  selection  of  "factors”  from  a  menu.  Sdecting  a  factor  from 
the  menu  leads  to  a  list  sub-factors  allowing  further  selection,  etc.  This  hierarchy  is  demmistrated  in  Hgure 
1.  The  taxmomy  of  factms  and  subfactors  is  essentially  derived  from  those  defi^  in  [Nguyen  92]. 

Details  for  a  subfactor  can  be  defined  using  ^ 
a  "template"  in  the  spirit  of  [Nguyen  92],  but  mme 
with  a  looser  format  This  template  also  allows 
definition  of  a  connection  to  design  items  0.e.  Petri- 
net  ttansititMis)  as  shown  in  Table  1.  For  example,  if 
a  requirements  function  is  defined  as  beamfotmer, 
then  die  performance  criteria  of  23  beams  per  second 
can  be  mrqiped  into  the  groiqi  of  Petri-net  derign 
diagram  items  that  implement  the  beam  ftsmer. 

Potentially,  the  Petri-net  model  can  generate  an 
actual  beams  per  second  performance  number  from 

the  group  of  design  items  fulfilling  the  requirement  Figure  1:  Requirements  taxonomy  hierarchy  example 
(given  a  set  of  input  drcumstanoes).  The  actual 

performance  can  then  be  compared  to  die  qiecified  beams  per  second  \riiich  generates  a  perfonnance 
metric.  If  die  function  is  defiiKd  as  "system"  agnifying  the  entiie  system,  dien  the  overall  Petri-net  modd 
would  be  used  to  generate  performance  numbers. 


Name; 

Reliabili^ 

Function: 

beamfonner 

■fyPK 

Numeric 

Range: 

>0 

lAiits: 

Seconds 

Measure: 

Mean  time  between  failures  (MIBF) 

Connection  to  Dengm 

Transition  XY,  Transidon  XYZ 

Additional; 

Rationale  -  life  critical  function 

Table  1:  Requirement  subfactor  detail  template 


The  ExpeR/T  program  is  driven  by  a  menu  user-interface.  The  level  1  menu  is  shown  in  Hgure  2. 
Submenus  to  the  Requirements  option  successively  proceed  down  the  requirements  subfaaor  hierardiy. 

Performance  Analyris 

The  approadi  for  performance  arudysis  in  ExpeR/T  allows  definition  of  performance 
characteristics  for  deagn  components  (transiticxis)  in  Petri-net  design  diagrams.  Spe^cation  ol 
performance  characteristics  for  requirement  factors/subfactors  are  also  allowed.  At  some  point  in  die 
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system  devetopment  process  the  designer  can  connect  requiiemeiit  functions  to  Petri-net  transitions.  The 
p^omance  i^ysis  component  then  takes  die  perfonnance  model  generated  by  the  Petri-net  and  compares 
it  to  the  required  peffonnance  for  die  appropriate  functioa 

An  mqtert  system  rule  set  in  die 
perfonnance  analysis  component  analyzes  the 
relevance  and  criticality  of  the  model  analysis 
deviations  from  die  iqiecified  perfomance 
requirements.  These  niles  identify  deviations  that 
incticate  important  problems  and  report  them  as 
alerts  to  the  user.  Fbr  example  a  laige  set  of  iiqwts 
rnisbt  be  qiplied  to  die  Petri-net  and  in  some 
oudying  cases,  1%  total,  reqxxise  time  requirements 
cannot  be  met  It  this  case  a  nile  can  specify  that 
die  user  only  be  notified  in  a  "warning”  in  a  detailed 
hard-o^y  listing  of  die  analysis  as  opposed  to  an 
on-line  alert  diat  is  also  highlighted  as  an  alert  in  the  hard-o^y.  Fbr  examjde: 

IF  response  requirement  deviates  from  model  amulation 

AND  quantity  of  deviations  is  less-than  5%  of  test  cases 

THEN  rqxMt  deviaticm  in  wanting  only 

/ 

Critical  Component  Identification 

Critical  omnponent  identification  analyzes  the  results  of  the  performance  analysis  module  to 
identify  components  of  the  design  that  create  bottloiedcs  in  processing  or  are  in  the  critical  path. 
Bottleneck  compcments  are  those  vritich  cause  higher  speed  processing  in  associated  components  earlier  in 
the  procesring  stream  to  wait  for  processing  in  the  subject  componenL  A  critical  path  component  is  one 
which  stands  alcme  in  an  important  processing  stream  which  could  potentially  disrupt  processing  in  event  of 
failure  or  low-performance. 

There  are  diree  ^pedfic  "figures  of  merit"  that  are  utilized  to  identify  critical  components  and 
procesring  bottlenecks.  These  include  the  time  focus  factor,  critical  path  aruilysis,  and  critical  state 
analysis.  Time  focus  factor  identifies  those  20%  or  so  critical  components  that  are  responsible  for  80%  or 
so  of  system  execution  time.  Vfith  critical  path  analysis,  simulation  detemtines  wttic^  paths  (patterns  of 
state  traversals)  occur  most  and  least  frequendy.  The  most  frequently  occurring  vrill  implies  a  set  of 
cmnponents  wttidi  require  the  utmost  care  to  ensure  system  integrity.  Critical  state  analysis  uses 
simulation  to  help  identify  those  states  that  occur  most  fiequently  which  indicates  the  most  active  and 
therefore  critical  components.  Critical  state  analysis  is  further  enhmuxd  by  an  expert  system  program  that 
determines  critical  components.  The  expert  system  rules  adjust  the  component  identification  based  on  die 
degree  to  which  critical  states  occur  more  frequendy  than  other  states. 

Problem  Analysis 

This  analysis  identifies  problem  areas  in  a  system  desiga  This  problem  analysis  is  performed  by 
algorithms  and  expert  system  rules.  The  areas  of  analysis  performed  include  condition  search  and  rule- 
based  analysis.  Condition  seardi  enables  the  designer  to  q>e(^  a  set  of  unwanted  design  conditions  vriuch 
are  searched  for  in  the  Petri  net  deagn.  For  example,  if  Slot  7  rqiresented  the  condition. 
"EjectionSeatArmed"  and  Slot  41  rqiresented  the  condition  "WeightOnWhedsTnie",  the  user  would  select 
these  two  conditions  and  the  tod  determines  if  this,  potentially  unsafe  {dienomencm,  could  occur.  Rule- 
based  analysis  utilizes  requirements  to  identify  proNems  such  as  violated  reliability  constraints,  unheeded 
error  recovery  considerations,  and  excessive  cost  factors.  The  design  diagram  ^tri  net)  supports  tire 
definition  of  needed  characteristics  sudi  as  error  recovery  and  cost 


Figure  2:  Level  1  menu  with  pulled-down  level  2  menu. 
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Fndriem  and  Critical  Component  Correction  Assistance 

PraUems  and  critical  areas  idemified  by  die  design  peifonnance,  critical  component,  and  problem 
analysis  modules  is  fuither  processed  to  iqxirt  apidicable  designer  assistance.  A  ntle  set  includes 
knowledge  about  bow  to  resolve  proUem  areas  modify  deagns  to  incoiporate  diese  problem 
resolutions.  An  example  of  the  type  of  ntle  diat  provides  conectionasastance  is  as  ftdlows: 

IF  componaiL.x  is  identified  as  a  bottleneck 

AND  canponencx  is  software 

THEN  suggea  dataflow  and  processing  division  onto  two  haidware  resources 
Graidiical  Dedgn  and  Simulation 

Commercial  ProTEM  ciyiability  is  inauporated  into  die  ExpeR/T  tool  to  yidd  real-time  system 
greyddcal  design  and  rimulation  facilities.  The  supported  Petri  net  mediodcdogy  is  ba^  on  Petri  net  dieocy 
and  is  the  result  of  extending  this  theory  in  order  to  make  it  more  pracdcal  and  effecdve  in  complex  systems 
development  Ibis  extended  fonn  of  Petri  net  tedmology  enables  users  to  model  all  facets  of  complex 
systems  indutfing  sequentiality,  concurrency,  asyndirony  and  qiedfic  ciqiabilities  for  real-time  systems 
such  as  priorities  and  timing.  The  following  secdtm  discusses  the  Petri  net  objects  and  rules  in  ^^y 
moredetaO. 


Automated  Conversion  of  Design  Diagrams  to  ProTEM  Petri  nets 

Dataflow  and  other  design  diagrams  in  aU  of  dieir  various  forms  and  variadtms  represem  die  most 
widely  used  means  of  describing  real-time  and  other  software  systems.  Althou^  they  have  proven 
themselves  to  be  valuable  in  describing  the  infonnadoi  flow  relationships  within  a  system,  they  do  not 
contain  sufficient  detail  to  enatde  an  engineer  to  gain  insi^  into  the  peifonnance  of  the  modeled  system. 
The  technology  required  to  do  that  is  availdile  in  die  f(»m  of  Petri  nets. 

The  advantage  that  Petri  nets  in  their  extended  or  enhanced  form  (e.g.  the  forni  used  in  die 
ProTEM  system)  have  over  data  flow  diagrams  in  die  area  of  real-time  strategic  and  tactical  systems 
iixdude  dieir  alnlity  to  model  pardlel  and  asynchnmous  operations,  account  for  probabilities,  portray 
priorities  and  monitor  and  detail  the  utilizaticm  of  resources  (e.g.  peqile,  devices,  softv^  components). 

Tbe  Uggest  problem  facing  anyone  \riio  wishes  to  use  Petri  nets  in  any  fnm  is  die  fact  that  die 
population  of  objects  indie  net  is  not  earily  surmised  fiom  a  cursory  inspectimi  of  a  system  level  diagram. 
The  paradox  whidi  exists  here  is  that  data  flow  (fiagrams  are  easy  to  use  and  understand  but  do  not  provide 
sufficient  infonnation  for  us  to  be  able  to  determine  ubether  or  not  a  particular  system  design  will  w(»k 
and  if  so  to  what  extent  \rinle  extended  Petri  nets  can  tdl  us  what  we  need  to  know  but  they  are  not 
intuitively  obvious.  The  purpose  of  this  derign  is  to  detail  how  the  giq>  can  be  bridged. 

Petri  nets  are  cmnposed  of  Tokens  flndcators  of  a  condition 
bdiig  true).  Places  (used  to  qiedfy  the  status  of  a  condition)  and 
Transitions  (processes  which  transftum  one  or  more  Ttdcens  which 
w  .e  iiqiut  to  the  Transition  into  more  or  one  Tcdcens  on  die  ouqnit 
side).  Extensions  to  Petri  net  technology  have  been  instituted  [Peters 
92,  Peters  93]  which  transform  Petri  nets  into  a  powerful  t^  ftxr 
modeling  real-time  systems.  These  enhancements  incliKie  timing,  token 
typing,  multiple  behavims,  non-homogeneous  places,  priority  and 
resources.  An  examine  of  one  type  of  Petri  net  graidiics  (without 
attribute  details)  is  presented  in  Hgure  3. 

The  process  of  transfomiing  DFDs  and  other  more  compatiUe  design  diagrams  into  ProlEM  TVpe 
Petri  Nets  (IteNETs)  may  be  summarized  as  shown  in  Hgure  4  and  as  detailed  in  the  Mowing  steps: 

□  Identify  the  pc^atitmofprocessing  (Ejects,  Tranritions,  which  comprise  die  system 

□  For  each  object,  determine  what  cmiditimis  and  resources  are  necessary  for  the  Transition  to 

execute 


Figure  3:  Example  of  one  form  of 
Petri  net  graplucs 
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□  PogimettiliBoondiiom  in  die  fpnn  of  Places  «pdfoim«netw^ 

□  Whwi  m«lr>  mptMng  ttm  ^r^trinn  thn»  nfmmrh  TWnritton  nH  tf^jy 


_  J__ 


Fitwre4:  OvavlewpfdetiiK 
diagram  to  ProNET  amvertioit 


T>ril|pQuaBty  Eraiuatfon 

Tlie  ose  of  sevend  icdkwaie  design  quality  evahniian  fifURS 
of  merit  can  giesdy  enhance  die  software  eniineef*8  sijiliiy  to  quickly 
and  cost  efSKdvely  assess  the  rdadve  merits  of  one  design  0¥cr 
another.  lUs  is  tree  of  bodi  new,  yet  to  he  built  qrsiems.  and  existing 
ones.  Figures  of  merit  indhided  in  EjqxR/T  that  enaUe  this 
evahmion  tdy  heavily  on  die  results  vriddi  die  PioTEM  Petri  net 
makes  avriUble.  Derign  quality  evaluation  figures  of  merit  in 
ExpdR/T  include  bduwior  analysis,  resouree  loading  and 
wdghtedfeomhinatorial  factois. 

Bduwior  analyris  fdks  on  die  Observadon  diat  the  wider  the  range  of  behaviors  pn«dWg  by  a 
conqwnent,  dm  higher  die  likdihood  that  die  cooaponent  win  fldl  [Halstead  753.  lUs  is  because  muhipie 
bdiaviors  inqily  pooriy  thought  out  pseudocode  or  logic  andAv  complex  intetftxres  to  odier  oonqxnents. 
Using  simulatian  nms,  behavior  analysis  breaks  om  and  assigns  overall  sysrem  behavion  to  infividutd 
system  conqxnents.  Inoidinateiy  laige  bduwior  factois  ftv  sonre  conqxments  imply  fuither  invesdgaiioo 
into  decomposing  the  component  is  wanrarued. 

Reraurce  loading  analysis  measures  die  Observadon  that  use  of  a  resource  (e.g.  oommunicatioos 
buss)  by  a  oonqnnent  constitutes  a  potential  bloddng  sihution.  One  wi^  to  avoid  this  is  the  use  of  a 
prioridzadon  scheme  whereby  the  hi^KSt  priority  tasks  will  gain  access  to  the  needed  resources  vriien  those 
resources  are  needed.  Two  important  fkctots  to  consider  here  are  1)  What  mOtta  a  component  a  hl^ 
priority  component  •  wdiat  it  does  or  vriiat  it  is  processing?  and  2)  Can  preenqxian  cause  a  firilure? 

By  mult4dying  or  combiniiig  some  d  die  individual  analysis  fiuxors  mathemadcaPy,  die 
<B£ferences  between  and  among  various  system  conqxments  can  be  made  more  pronouiiced. 
WeighiedAxxnWnatorud  factors  can  include  multiplying  the  figure  of  merit  ftw  the  most  used  componeitt  by 
die  number  of  dmes  it  gets  preempted  or  it  preempts  another  component 


CONCLUSimS 

During  a  Phase  I  fearibility  study,  we  successfully  developed  die  concqnual  requirements,  lugh- 
kvd  derign,  and  a  prcmfHifHxxioq^  prototype  ftn-dre  Eiqie^iiitdiigent  real-time  system  assessment  tool 
Ihe  prototype  is  operational  widi  some  features  completielyimplemenied.  It  riiows  that  die  system  design 
and  assessment  tool  concqit  is  viable  and  that  its  lull  impkmentadcn  is  fearible.  The  benefits  of 
successfully  developing  the  EiqieR/T  tool  win  be  inqxoved  conqdex  real-time  systems  devdopment 
duoo^  automated  reqiriiements  spedficadon,  systems  derign,  critical  area  klentificaiion,  and  design 
evaluation.  ExpeR/T  can  be  apoiM  to  computer  ^sterns  devdopment  for  DoD,  Federal  agencies,  and 
conunercial  industries  indudfcig  '  Vljnics,nuclw  systems  and  tdeoommunicadons. 
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1.  Introduction 


Hus  paper  discusses  die  design  of  an  environment  for  evaluating  peifonnance  of  parallel  systems. 
It  resu^  fitom  a  Hiase  I  research  project  funded  by  the  Naval  Umtowater  Systems  Center 
(NSWC)  under  the  SBIR  program.  Three  tcfhnical  objectives  were  iHemified’  (i)  find  a  convenient 
way  to  describe  ami  synthesize  a  large  class  of  triplications,  (2)  similariy,  define  a  meAod  to 
describe  and  ^thesize  a  large  class  of  parallel  systems,  and  (3)  find  a  convenient  method  for 
mapping  applicatitm  progn^  to  tire  system  nodes  and  predict  the  perfonnance  of  die  resulting 
system.  In  lotion  to  showing  the  feasibility  of  die  above  objectives,  Ae  research  result  in  the 
wvelopment  of  a  prototype  m  the  environment  The  prototype,  based  on  an  Innovative  Research 
design  evaluation  tool,  established  the  feasibility  of  the  f^t  two  objectives  and  developed  the 
oudiMs  of  die  approach  for  die  diird.  This  paper  ^scusses  th^  findings  and  plans  for  its  Hiase  n 
development  A  proposal  for  developing  many  of  the  functionalities  discussed  in  this  paper  has 
been  submitted  to  NSWC 

2.  Backpround 

It  is  becoming  increasingly  evident  that  parallel  systems  will  play  a  major  role  in  meeting  the 
computational  needs  of  large  scientific,  mili^,  and  commercial  applications.  This  is  partially  due 
to  the  rapid  increase  in  the  processing  requirements  of  large  applications  which  are  growing  at  a 
rate  faster  than  is  provided  for  by  the  intro^tion  of  new  processor  generations.  A  simple  example 
serves  to  illustrate  tbe  poinL 

To  solve  a  two  dimensional  Poisson  equation  using  finite  differences  on  an  nxn  grid  requires  the 
solution  of  a  set  of  linear  eqi^ons  of  the  type  Au  s  f  where  A  is  a  square  matrix  of  dirxrension  n^. 
The  number  of  qierations  using  the  crudest  Russian  Elimhiation  solution  technique  will  be  of  die 
order  of  n^.  For  a  problem  of  reasonable  size,  say  nsi,000,  the  number  of  operations  is  therefore 
of  order  lO'*.  The  magnitude  of  this  problem  is  realized  when  we  consider  that  a  fast  OlAY  Y-MP 

perfomis  at  2  GFLOPS,  i.e.  2x10^  operations  pj^  second.  Thus  to  solve  the  problem  on  this 
machine  requires  317  years.  Most  zeal  applications  are  more  complex  than  this  exatiqile.  For 
example,  solving  a  thrro  dimensional  Poisson  problem  increases  the  computational  requirement 

and  solution  time  by  a  factor  of  n^.  Of  course,  there  are  a  number  of  techniques  (e.g.  iterative 
solution  or  FFT  ba^  methods),  and  properties  of  the  problem  (e.g.  sparsity  of  the  matrix)  that 
can  significantly  reduce  the  processing  time  -  to  a&.  low  as  0(n’ log(R))  operations  in  this  special 
case.  However  realistic  numerical  simulations  often  ^uire  the  rqiea^  solution  of  such  a  pioblon 
perhaps  thousands  of  times  (e.g.  the  number  of  timesteps  in  a  fluid  simulation).  Further,  the 
monc^  requirements  of  the  problem  are  very  high  and  wili  li^t  the  choice  and  p^otmance  ^an 
architecture.  This  simple  example  illustrates  that  even  on  the  fastest  vector  machines,  it  is  quite 
difficult  to  solve  large  problems  in  a  reasonable  time.  Thus,  alternative  options  must  Ire  considered. 

The  most  promising  alternative  so  far  has  been  paraUel  systems.  A  number  of  such  systems  have 
been  introduced  and  the  number  is  increasing  rapidly.  Rf^nt  experience  has  shown  that  parallel 
tystems  are  no  longer  entirely  experimental,  and  that  useful  systems  ^ve  already  been  developed 
(e.g.  the  Connection  Machine  CM-2  and  CM-5,  Inrel  iPSC860,  etc.).  Furthermore,  it  has  been 
shown  that  they  can  be  used  to  solve  teal  problems  (e.g.  linear  idgebra,  partial  difierential 
equatimis,  etc.)  and  at  performances  many  tinres  faster  than  a  conventional  vector  con^uter  such  as 
the  Cray  YMP/8  (reference  [<S]  describes  24  Gfiops  coixqtutations  on  a  CM-5).  However,  it  has 
not  been  shown  that  such  sj^tems  can  be  used  as  general  purpose  conqtuters,  and  in  fact  they  may 
never  be.  The  relevant  question  is  not  whether  a  Terafiops  system  can  te  built,  but  radio:  how  such 
powerful  systems  can  be  used  effectively.  In  many  cases  only  a  small  f^tion  of  a  system's 
potential  jrewer  is  utilized.  Poor  performance  of  such  systems  cm  be  attributed  to  one  or  more  of 
the  following  causes: 
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•  The  iiwolxqHUibility  of  die  aiiplicadon  and  the  aichitectine  is  a  major  cause  of  poor  peifonnance. 
1^  exanqile,  to  solve  a  small  to  moderate  sized  set  of  linear  equations.  LU  dectnnposition  may 
wock  well  on  vector  machines  such  as  C3tAY,  while  the  same  alj^orithm  may  perform  pooriy  on  a 
massively  pandlel  machine.  Generally,  non-structured  i^licadons  with  non-ho^geneous 
siApioblems  are  poor  candidates  for  inqilementadon  on  massively  parallel  systems  which 
are  most  efficient  when  performing  identical  kidc-step  qperadons. 

•  The  application's  decomposition  into  tasks  and  their  nupping  tmto  processors  have  not  been 
perfon^  t^timally.  To  fully  expltnt  the  potmidal  powo*  of  ttese  systems  the  application  must 
be  decomposed  with  maximum  parallelism  into  tasks  (grains)  and  mapped  to  processors  in  some 
t^itimal  fashion.  The  deconqioatitni  strategy  and  die  subsequent  assignment  of  taslrs  is  a  major 
determinant  of  system  performance,  since  the  order  of  processing  determines  how  t^lcs 
commucicate  wiA  each  other  and  how  synchronizadon  among  tasks  will  proceed. 


In  die  absence  of  gc^  methodologies,  benchmarking  has  been  widely  applied  as  an  evaluation 
tool  But  benchmarking  is  highly  application  specific  and  its  results  caimot  be  generalized.  In  most 
cases,  performance  can  vary  widely  and  the  benchmarking  results  must  be  manually  manipulated  to 
optimize  performance.  The  basic  idea  of  benchmarking  is  to  define  a  number  of  konels  (or  pieces 
<k  code)  which  can  represent  a  wide  range  of  implications.  The  character  and  the  choice  or  diese 
representative  codes  can  vary  widely,  but  they  fall  into  four  categories:  synthetic  such  as 
\Shetstone  and  Dhrystone.  kernels  such  as  Livermore  Loops,  algorithmc  such  as  UNPACK,  and 
application  such  as  Perfect  Qub.  Benchmarking  has  gained  wide  acceptance  because  the  alternative 
approaclws  of  analytical  modeling  or  discrete  event  simulation  are  not  feasible  beyond  die  sinmlest 
examples,  and  do  not  offer  much  hc^.  Analytical  models  are  too  difficult  to  biidd  and  gene^y 
not  very  accurate.  Discrete  event  simulation  quickly  becomes  too  conqilex  and  unwieldy.  But 
benchinaridng,  in  its  present  form,  also  fails  to  address  several  important  issues  it  was  originally 
designed  for.  Althou^  considerable  attempts  have  been  made  to  produce  benchmarks  which  are 
more  representative  (rf  real  workload,  benchmarking  still  remains  inadequate  as  an  evaluation  tool 

Thus,  in  spite  of  its  widespread  use,  benchmarking  is  not  the  method  of  choice  for  measuring  (and 
more  importandy  predicting)  the  performance  of  an  application  on  a  parallel  system.  This  is 
especially  true  for  die  end  users  whose  primary  interests  are  the  performance  of  their  applications. 


Li  view  of  the  above  discussion,  we  poposed  to  determine  the  feasibility  of  developing  an 
environment  that  can  be  used  to  conveniently  define  parallel  architectures,  design  and  deconmose 
applications,  map  their  tasks  into  the  system  nodes,  and  evaluate  the  performance  of  the  system, 
h^jor  consideration  in  such  undertaking  is  that  it  must  serve  a  diverse  group  of  usors  with  varied 
background  and  expertise  who  will  rely  on  the  environment  to  evaluate  tte  performance  of  an 
application.  The  development  of  the  environment  must  proceed  with  this  in  mind,  and  must 
possess  tools  required  to  support  these  and  other  needs.  Flexilnli^  is  an  inmmtant  consideration  as 
the  environment  will  be  usra  to  analyze  a  varied  class  of  architectures.  Rexibility  has  a  broad 
interpretation,  covering  the  ability  to  define  a  wide  range  of  architectures,  applications, 
performance  measures,  and  c^ierational  concepts  (e.g.,  various  types  of  scheduling  strategies, 
synchronization  sceruuios,  parallel  algorithms,  etc.).  I^xibility  is  also  refiected  in  the  number  of 
options  the  user  will  need  to  desi^  ai^  execute  a  rziodel.  The  definition  of  events,  identification  of 
resources  to  be  analyzed,  partitioning  of  the  problem,  the  granularity  of  the  tasks,  and  their 
assignments  to  processors  are  just  few  of  these  options  essential  to  developing  flexible  models. 
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Hguie  1.  Opening  wiiulow  of  EAPS 
4.1.  Parallel  System  Definition 

An  system  elements  are  defined  hierarehically.  At  die  highest  level  a  parallel  system  is  a  node  in  a 
larger  system,  and  thus  can  be  connected  to  a  front-end  processor  (host),  storage  devices,  or 
woricstations.  At  this  level  one  needs  to  only  specify  the  sdobal  attributes  of  the  system  such  as 
name,  speed  of  processor  at  a  node,  speed  of  communication  processor  at  a  node,  number  of 
virtual  processors  per  node,  number  of  nodes,  memory  per  node,  netwt^  protocol,  etc. 

The  second  level  deconqiosition  of  an  architecture  can  be  done  either  by  defining  a  new 
decomposition,  using  a  previously  defined  decomposition,  or  using  a  libra^  decomposition. 
Decomposition  of  a  parallel  system  consists  of  defining  the  node  structure  and  applying  it  to 
various  nodes,  specifying  the  communication  netwoik,  and  assigning  application  tasks  to  nodes. 

Two  forms  of  assignn^t  can  be  performed:  node  and  application  deconpositions.  Two  grids  are 
provided.  One  to  specif  the  nod^  a  given  (node)  decomposition  qply,  and  the  other  specify  the 
nodes  a  program  is  assigned.  Initially,  each  gird  represents  all  nodes  of  the  parallel  system,  and 
hei^  a  program  (or  a  node  decomposition)  assigned  to  the  system  is  assigniki  to  all  nodes.  The 
assignment  of  an  entity  can  Ire  applied  to  a  subclass  of  nodes  by  progressively  partitioning  the 
grids.  Each  action  will  partition  tte  nodes  into  two  equal  parts,  and  thus  after  N  actions,  ea<± 
partition  represents  M*2  nodes,  where  M  is  the  total  numbo*  of  nodes.  The  partitioning  action 
can  be  reversed.  The  assigninent  of  an  end^  (a  node  decomposition  or  a  program)  to  a  subclass 
will  assign  it  to  all  nodes  within  that  subclass.  This  capability  allows  for  tire  representation  of 
SIMD,  MIMD,  SI^4p,  and  mixed  architectures.  Selection  of  a  task  will  highlight  its  partition.  An 
alternate  way  of  partitioning  the  nodes,  and  arbitrary  decomposition  of  a  system  with  arbitrary 
number  of  nodes  in  each  subclass  will  be  acctxmmodated  via  scripting 
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Scripting  is  a  mechanism  whereby  a  previously  created  script  file  may  be  provided  which  defines  a 
detailed  node  decoirqposition,  assignment,  and/or  network  description.  Al^  it  is  possible  to  ask  the 
system  to  generate  a  script  for  any  constnic^  node,  including  builtin  defaults  or  liteuy  choices. 
Scripting  is  a  powerful  and  verst^e  utility  in  EAPS  with  a  number  of  key  applications.  Its  most 
important  use  is  in  assigning  decon^sed  nodes  and  application  programs  to  arbitrary  groups  of 
nodes.  Scripts  are  sets  of  co^  developed  by  the  user  specifying  now  system  entities  (programs, 
node  decon^sitions,  etc.)  should  be  assigned.  EAPS  will  provide  Ae  capability  for  writing 
Scripts  and  can  read  (Le.  execute)  them  upcm  user's  command.  For  Phase  U,  scripting  w^  be 
limited  to  the  develo^mient  and  the  executitm  of  a  previously  defined  set  of  code  to  assign  programs 
or  a  node  decon^sition  to  an  arbitrary  class  of  nodes. 

The  assignment  of  a  decomposed  node  (or  a  program)  to  a  group  of  nodes  is  accmimlished  by 
selecting  the  entity  and  assigning  it  to  a  desired  partition  of  the  appropriate  grid  (which  is  then 
highlighted).  This  enables  the  usa  to  group  nodes  into  a  numba  of  subclasses  each  with  different 
decomposition.  Arbitrary  partitioning,  not  a  function  of  powers  of  two,  is  handled  either 
graphically  (i.e.  by  enclosing  the  desired  set  of  nodes)  or  via  Scripting.  An  existing  decon^osition 
can  be  reviewed  and  edited,  and  put  in  a  library  and  naade  available  to  other  usos. 

The  specific  network  descriptions  are  predefined  and  cover  a  range  of  standard  commercial 
architectures.  These  wW  include  Intel  Paragon,  Kendall  Square  KSRl,  CRAY  MPP  and  Thinking 
Machines  CM-S.  The  inclusion  of  a  number  of  specific  descriptions  aUows  the  system  to  be  used 
as  a  training  tool  by  both  vendors  and  users  of  Ml^  systems. 

Library  modules  are  provided  for  a  range  of  standard  networks.  These  are  of  two  types:  generic 
and  specific.  Generic  networks  describe  rings,  meshes,  torii,  hypercubes,  cube-connected  cycles 
and  other  standard  configurations.  Hierarchical  networks  may  be  built  fixim  these  units.  For 
example  a  network  such  as  the  KSRl  or  SUPRENUM-l  is  a  two-level  hierarchy  -  using  a  ring  or 
bus  to  interconnect  a  cluster  (Ring:0  in  the  KSRl,  or  Ouster  bus  in  the  SUPRENUM-l)  and  using 
a  further  ring  (Ring:l)  in  the  KSR  to  connect  many  Ring:0,  or  a  grid  of  busses  (SUPRENUM 
Bus)  in  SUPRENUM  to  interconnect  clusters.  Network  descriptmns  may  be  provided  either 
globally,  through  a  script  file,  or  locally  by  providing  node  interconnectivity  information.  Global 
network  can  then  be  specified  by  replication.  Networks  require  several  parameters  for  a  con^lete 
description.  These  include  coimection  bandwidths,  communication  latency,  message  buffering 
capab&ties  (numbd:  and  total  size),  support  for  asynchronous  communication,  ability  to  overlap 
communication  with  conqnitation  and  abilify  to  communicate  simultaneously  on  multiple  channels. 
All  of  these  characteristics  together  specify  a  network. 

Anotiier  critical  aspect  of  MPP  design  is  the  communication  protocol  between  nodes.  At  the  lowest 
level,  protocols  may  be  either  store  and  forward  or  worm-hole  routing.  At  a  higher  level  one  needs 
to  support  message  fyping.  We  intend  to  support  several  of  the  current  protocols  as  predefined 
modules.  These  will  include  PVM,  EXPRESS,  PARMACS,  MPI  and  possibly  Intel  NX2.  Other 
standards  will  be  added  as  they  appear. 

Communication  protocol  will  play  the  same  role  as  processor  instruction  set  does  for  CPU  design. 
At  the  highest  level,  communication  for  a  program  or  subprogram  will  be  represented  in  a 
simplistic  way  in  terms  of  exacted  data  volume  and  destination.  At  a  lower  level,  inctividual 
communication  operations  will  be  recognized,  and  simulated  taking  into  account  whether 
a^duonous  c£q)abilities  are  available. 
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Hgure  2.  DeoMiiporitioD  of  a  node  of  Paragon 
4.2.  Application  Decomposition  and  Synthesis 

Similar  to  the  architecture,  the  application  is  defined  hierarchically,  Le.  it  can  be  defined  as  an 
entity  and  decomposed  in  terms  cf  a  number  of  piogranu  (or  kernels).  The  kernels  themselves  can 
be  further  decennposed  in  tenns  of  other  kemw.  The  idM  of  defining  application  kernels  is  not 
new  (e.g.  Bailey,  D.,  et  al.  [2],  and  Saavedra-Baxre^  R.,  et  al.  [13],  among  others).  SES 
Workbench  expands  this  caps^ifity  by  providing  a  limit^  capability  of  defining  kernels.  Our 
approach  differs  in  a  significant  way: 

Most  authors  have  developed  the  idea  of  using  kernels  in  the  contnt  of  benchmarking.  They  have 
attempted  to  identify  a  number  oS  fixed  kernels  to  represent  a  large  class  of  applications  (mostly 
science).  Thus,  the  user  can  onl^r  use  the  availabte  konels  to  synthesize  an  application  without 
the  abiliiy  to  alter  their  charactetistics,  e.g.  change  their  attributes,  deconqrose  them,  redefine  them, 
or  add  to  them.  We  allow  die  user  to  define  any  number  of  kernels,  define  their  attributes  in  a 
number  of  different  ways,  manipulaie  diem,  decompose  them,  combine  them,  and  use  diem  fieely. 

An  {plication  can  be  represented  as  a  single  entity  and  analyzed  as  such  (though  it  may  not  be 
very  mteresting  or  useful).  Its  attributes  can  be  spewed  in  a  number  of  forms.  It  can  be  assigned 
to  any  node  or  all  nodes  of  the  architecture.  Such  analysis  may  be  useful  in  understanding  the 
behavior  of  a  piece  of  code  uniformly  assigned  to  aU  nodes  cd  a  systera  Programs,  tasks,  or 
functions  (collectively  called  ken^ls)  are  the  building  blocks  of  die  triplication  and  can  be  defined 
in  two  different  ways:  by  choosing  program  fiom  the  software  menu  or  by  double* clicking  on  an 
existing  program  in  the  ^plication  window. 
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43.  Performance  Analysis 


Perfonnance  analysis  is  the  process  of  applying  performance  metrics,  such  as  the  ones  described 
above,  to  an  aroUcadon  and  then  analysing  tte  results.  There  are  three  basic  strategies  Aat  can 
be  incoporated:  analytic^  benchmarking  and  simulation  analyses.  purpose  ci  pcxfonnance 
analysis  is  to  be  pret^dve:  an  en^eering  design  project  may  need  an  advance  estimate  of  the 
computadtmal  resources  it  will  need  to  run  specific  applicatuxis  with  certain  grid  sizes.  A  second 
use  is  to  optimize  performance  by  analyzing  the  efficiency  an  existing  co^.  An  effecdve  tool 
will  provide  useftd  informadon  in  both  of  these  situadons.  The  analysis  component  of  EAPS  is 
based  on  the  desire  to  combine  the  best  features  of  benchmaddng.  analydcal  modeling,  and  discrete 
event  simuladon  to  enable  an  end  user  (an  analyst  or  a  scientist)  to  evaluate  (arid  predict)  the 
perfocmance  of  an  ai^licadon  on  a  parallel  system. 

43.1.  Purpose  of  analysis 

The  arudysis  of  a  parallel  system  is  performed  to  serve  two  distinct  purposes: 

a.  Application  performance.  This  reflects  the  applicadon's  performance  and  is  of  interest  to  the 
user.  Metdcs  such  as  response  dme,  execudon  dme,  elapsed  dme,  and  speedup  quantify  the 
n^ults  of  tlus  form  of  analysis. 

b.  System  capacity.  This  form  of  analysis  quantifies  the  total  system  capacity  and  is  usually 
represented  in  terms  of  mega-,  giga-,  or  tera-Flops.  The  key  concern  in  this  form  of  analysis  is  to 
represent  the  expected  system  capacity  as  a  percentage  of  theoredcal  capacity  of  the  systeia  This 
rado  quant^es  the  unused  (or  idle)  portion  of  the  capacity  and  how  much  the  system  spends  for 
overhead  purposes  such  as  commumcadoiL 

432.  Forms  of  analysis 

As  we  have  already  mentioned  EAPS  will  allow  the  simultaneous  performance  of  a  number  of 
heterogeneous  analyses.  Major  analysis  forms  that  evenmally  will  be  supported  are; 

•  Analytical  analysis  is  based  entirely  on  previous  measurements  of  a  system,  or  on  known 
features  of  the  design  of  a  system.  In  the  simplest  case  one  might  measure  the  vector 
poformance  of  a  single  processor  and  model  it  as  a  function  of  vector  length.  Indeed  this 
particular  example  is  particularly  well  known,  being  described  typically  by  the  vector  length 
required  to  achieve  50%  of  peak  performance.  Such  a  model  can  then  be  incorporated  in 
analyzing  performance  of  a  co^  that  is  dominated  by  vector  (q)eradons  -  provided  the  lengths 
of  vectors  are  known,  and  not  determined  only  at  runtime.  Even  in  the  case  that  needed 
informadon  is  available  only  at  runtime,  an  analytical  model  can  still  be  incorporated  into  a 
poformance  estimator  providisd  the  appropriate  runtime  parameters  are  available  to  it  Analytical 
analysis  7S  an  important  tool  in  provit^g  understanding  of  the  gross  behavior  of  codes,  but  it  is 
excMdingly  hard  to  extend  to  any  level  of  detail.  However  providing  even  rou^  "ball-park" 
estimates  oi  performance  can  be  important  -  both  to  system  designers  and  to  applications  users. 

«  Simulation  analysis  is  based  on  a  simulation  of  an  actual  application.  Typically  the  code  is 
rqrresented  schematically  in  some  way  and  then  executed  on  a  simulator.  Traces  of  various 
operations  and  parameters  are  accumulated  during  the  simulation.  The  key  is  to  use  an  effective 
represoitation  that  captures  the  main  aspects  of  the  code  while  omitting  mie  detail  that  would 
cause  the  application  simulation  to  run  endlessly.  For  example  an  operation  such  as  a  Fast 
Fourier  Transform  may  be  represented  as  a  single  item  in  a  trace,  and  its  "cost"  assigned  using 
an  analytical  or  measured  performance  from  an  earlier  benchmarking  analysis.  Sometimes  one 
may  run  a  simulation  analysis  because  the  hardware  target  is  either  not  available,  or  has  not 
been  built  yet  Here  again,  simulating  all  aspects  of  execution  tends  to  be  impractical  inmost 
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cases,  and  models  for  behavior  of  components  of  a  system  can  greatly  speed  the  simyilation 
wittout  sacrificing  too  much  predictive  accmacy. 

*  Benchmarking  analysis  involves  tracing  and  measurement  of  an  application  on  real 
hardware.  Depending  on  the  tracing  environnrent  used,  information  obtained  may  be  as 
simple  as  execution  time  or  as  det^ed  as  a  complete  trace  of  all  memory  accesses  in  all 
processors.  Fot  applications  of  sigmficant  size,  complete  traces  might  run  to  billions  of 
(q)erations  and  are  not  ^ctical.  However  traces  of  subroutines  or  smaller  blocks  may  be 
extremely  helpful  in  buildmg  an  analytical  or  modeled  description  of  such  blocks.  Benchmarking 
is  also  invaluable  for  locating  criti^  sections  and  suggesting  places  where  optimization 
would  be  most  fimitful.  The  nature  of  the  benchmarking  perfmmed  will  typically  depend  on 
the  architecture.  In  the  case  of  shared  memory  systems  the  emphasis  tends  to  be  on  tracing 
access  patterns  to  shared  memoiy  locations,  while  for  distributed  memoiy  systems,  tracing  of 
intetprdcessor  communication  patterns  is  most  relevant 

*  User-defined  code.  The  performance  of  an  entity  can  be  determined  by  executing  custom 
developed  cotfe  which  may  be  a  new  algorithm,  an  alternate  descriptitm  of  tehavior  of  an  entity, 
or  a  new  simulatitm  of  a  device. 

*  Data  from  actual  code.  An  extreme  case  of  flexibility  is  the  incorporation  of  performance 
fitnn  the  execution  of  actual  code  on  a  target  architecture. 

4.4.  System  optimization  and  dynamic  reallocation  of  tasks 

A  major  consideration  in  the  design  of  parallel  systems  is  the  optimization  of  performance  by 
reassigning  tasks  to  system  nodes.  Althou^  at  first  it  may  seem  that  equalizing  node  utilization 
may  lead  to  an  optimal  system,  in  reality  tl^  is  far  firom  true  for  two  reasons;  (1)  equalizing  node 
utilization  may  1^  to  hi^er  conimunication  overhead  and  lower  efficiency,  and  (2)  since  tasks  are 
not  infinitely  divisible,  achieving  uniform  node  utilizatioa  is  an  impossibility.  Therefore,  the 
problem  can  be  viewed  as  a  non-linear  constrained  optimization  problera 

Independent  of  how  the  problem  is  formulated  or  solved,  the  optimization  is  useful  only  if  it  can  be 
done  dynamically,  i.e.  during  processing  tasks  are  reassigned  bas«l  on  a  pre-defined  criterion. 
Dynamic  reassignment  of  tasks  adds  anotlW  dimension  to  an  already  difficult  problem. 

4.5.  Libraries 

libraries  are  the  critical  building  blocks  drat  allow  users  to  reuse  previously  developed  simulation 
specifications,  and  to  create  new  ones  in  a  methodical  fashion.  A  librruy  stores  a  node  description, 
network  topology,  program  or  other  specification  in  a  fonn  where  it  may  later  be  accessed  by 
referring  to  its  name  and  providing  ap^priate  parameters.  Libraries  also  enco^  default  values 
for  attributes  and  allow  applications  m  access  library  modules  transparently.  Libraries  are  the 
primary  means  of  providing  performance  information  which  has  been  previously  obtained  either  by 
analytic  or  erqrerimoital  simulation. 

Libraries  are  allovred  to  be  hierarchicaL  While  all  elements  of  a  library  are  regarded  as  equivalent, 
a  library  may  utilize  a  lower-levd  library  to  which  it  places  calls.  No  cycling  is  allowed  -  the  lower- 
level  library  may  not  call  the  higher  level  one  for  example.  Hierarchical  libraries  may  be  used  to 
define  mul^evel  descriptions  of  entities.  In  this  case,  the  tc^  level  library  modules  record  the  top 
level  view  of  a  node,  network  or  program,  and,  undo’  control  of  a  parameter,  may  either  return  this 
description,  or  may  make  a  call  to  the  next  level  library  to  give  a  more  dett^^  rqrresentatioii. 
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4.6.  Application  Program  Interface  (API) 

Application  Program  Interface  (API)  expands  the  capability  of  EAPS  by  allowing  the  user  to 
develop,  edit,  review,  and  execute  other  programs  from  within  EAPS.  Upon  receipt  of  the  Execute 
cormnand,  EAPS  is  temporarily  exited  to  execute  the  selected  program,  and  return.  Major 
applications  of  API  are  to  interface  with  other  tools,  to  develop  and  execute  custom  develop 
code,  and  to  develop  benchmarks  and  include  them  in  the  odier  forms  of  analysis. 

4.7.  Tracing  and  proflling  capability 

Thcing  and  profiling,  an  intermediate  step  between  system  definition  and  analysis,  enables  EAPS 
to  accept  performance,  attribute,  and  other  data  about  a  system  entity  fix>m  an  external  source,  e.g. 
other  tools,  or  via  the  execution  of  actual  code.  Its  major  use  is  deriving  (or  specifying)  the 
attributes  of  complex  entities  that  will  be  difficult  to  find  otiierwise.  A  typical  example  is  def^g 
the  attributes  of  a  decomposed  node  containing  computing  and  communication  processors,  local 
memory,  cache,  and  registers.  Predicting  the  p^ormance  of  such  a  complex  entity,  analytically 
(or  via  simulation),  can  be  difficult  An  alternate  way  will  be  to  incorporate  data  (terived  either 
from  executing  an  actual  code,  a  benchmark,  or  fiom  another  tool  into  EAPS.  Tracing  will  be  an 
<^tion  in  the  definition  fomi  of  items  being  specified  and  permits  the  user  to  name  a  file  (or  a  tool) 
to  find  the  attributes  the  desired  entity. 
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Abstract 

Dependable  systemi  are  needed  to  meet  the  demands  of  oitkal  real-time  applications.  The 
relialnlity,  r^ponse  time  and  recovery  time  requirements  are  used  to  divide  the  set  of  dependable 
systems  into  three  classes:  nltra-dependable  systems,  highly  dependable  systenu,  and  highly 
available  systenu.  Other  system  characteristics  imidicit  in  the  class  can  then  be  extracted.  This 
schenu  is  applied  to  a  variety  existiag  dependaUe  systems. 


1  Introduction 

Critical  real-time  applications  depend  npcm  embedded  digital  systems  to  perform  speedy  and  precise 
confutations,  hi  the  past,  fault-tolerance  methods  that  assured  error-free  results  were  devdqped 
separately  from  methods  that  guaranteed  real-time  performance.  The  assumptions  made  to  address 
real-time  issues  were  often  in  conflict  with  those  needed  for  fault-tolerance  and  vice  versa.  As 
a  result,  few  systems  exist  that  can  be  guaranteed  to  meet  critical  real-time  constraints  in  the 
presence  of  faults.  Both  current  and  future  applications,  such  as  aircraft  fli^  control,  hi^-speed 
communication,  and  on-line  data  retrieval  require  dependable  systenu.  Such  systems  must  address 
real-time  and  fault-tolerance  constraints  to  meet  their  error-free  service  requirements.  The  genetic 
concept  of  dependability  encompasses  the  issues  and  techniques  most  commoiily  used  to  identify, 
implement,  and  measure  system  fault-tolerance  and  real-time  performance.  As  defined  in  [1], 
dependability  is  a  qualitative  term  that  describes  the  trustworthiness  of  a  computer  system.  Om 
goal  is  to  use  duracteristics  rdevant  to  both  real-time  and  fault-tolerant  concerns  in  identifying 
d«q>endable  system  features  appropriate  for  a  given  application. 

Before  proceeding,  we  review  teiminology  essential  to  this  discussion.  Then,  we  discuss  the 
use  of  an  application’s  reliability,  response  time,  and  recovery  time  requirements  in  determinhig 
essential  features  candidate  dependable  systems.  We  conclude  by  applying  this  method  to  existing 
ftnlt-tolerant  and  real-time  systems. 

2  Fundamental  Terminology 

The  design  of  a  dependable  real-time  system  must  integrate  issues  common  to  both  fault  tolerant 
and  real  time  systems.  As  shown  in  Figure  1,  from  [1],  dependability  can  be  partiticmed  into 

*Sappoztcd  ia  put  by  ONB.  Contzut  #  N00014-91-C-0014 
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Figure  1:  Dependability  braes 

iix^ainneats,  means  and  measures.  Dependability  impairments  consist  of  failures,  errors,  and 
faults  which  can  prevent  service  requirements  from  being  met.  DQ>endability  means  include  favlt 
avoidance^  where  faults  are  prevented  through  the  use  cd  intrinsically  reliable  components  or  formal 
methods;  faaU  tolerance^  where  the  effects  of  errors  are  masked  through  the  use  of  redundant 
elements;  error  removal,  where  the  presence  of  latent  errors  is  minimised;  and  error  forecasting, 
where  the  presenw,  creation,  and  consequences  of  errors  are  estimated.* 

Dependability  measures  include  availability  and  reliability.  The  availability  of  a  system,  i4(f), 
is  the  probability  that  the  system  is  operaticmal  at  the  instant  of  time  t.  If  the  limit  cff  i4(t)  as 
t  approaches  infinity  exists,  that  limit  represents  the  fractirm  of  time  that  the  system  is  capable 
of  performing  useful  work.  The  reliability  of  a  system,  R{t),  is  the  conditional  probability  that  a 
system  will  be  operational  at  time  t  =  r,  ^ven  that  it  was  operational  at  time  t  =  0.  Thus,  it  is 
typically  more  difficult  to  guarantee  rdiability  than  availability.  [2]  The  numerous  tradeoiffi  between 
reliability  and  availability  significantly  affect  system  life'cycle  costs.  The  rdiability  requirement  of 
correct  operation  throughout  an  interval  is  more  stringent  (and  e:q>ensive)  than  the  iiutantaneons 
availability  requirement.  In  fact,  some  applicatioru  may  be  able  to  tolerate  minutes  or  even  hours 
of  unavailability  while  a  failed  system  recovers.  Thus,  the  recovery  time  requirements  also  identify 
dependable  system  parameters  suitable  for  a  given  application. 

A  dq>endable  system  must  perform  functions  or  provide  services  within  a  time  frame  determined 
by.  the  application  requirements.  In  a  general  sense,  fanlt>tolerance  focuses  on  improving  system 
reliability  and  availability,  by  supporting  continual  correct  (deration  after  faults  occur  or  restoring 
operations  after  system  fruhne.  Similarly,  real-time  researdi  focuses  tm  ensuring  the  timeliness 
of  services.  Real-time  systems  have  three  basic  task  ccmiponents:  reading  inputs,  performing 
computations,  and  producing  outputs.  Services  must  be  delivered  within  finite  time  intervals  as 
dictated  by  the  iqq>lication  response  time  and  other  performance  requirements.  There  may  be 

*Tlie  zcsder  is  tcfiemd  to  [1]  for  a  detuled  discassioa  of  depeadaInBtj  issacs. 
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tereral  modes  of  (q;>eratioin  (inch  as:  takeoff,  crnise,  and  land)  whidi  can  cause  the  workload  to 
increase  or  decrease.  In  a  hard  real-time  system,  computations  must  be  performed  with  a  specified 
freqnenqr  or  response  time.  The  faSoxe  of  a  task  to  start  or  to  complete  on  time  can  huTe  intoleraUe 
consequences,  sudi  as  loss  of  system  control  or  loss  ai  hie.  Conrersely,  in  a  soft  real-time  system 
missed  deadlines  may  be  tolerable.  A  task  may  start  or  conqilete  late  without  system  ftilnxe, 

provided  that  the  workload  behavior  satisfies  an  acceptable  statistical  distribution.  Systems  with 
hard  deadlines  often  rely  on  worst  case  scheduling  analyses  to  guarantee  task  deadlines.  Systems 
permitting  soft  deadlines  may  schedule  tasks  based  on  average  behavicn  to  improve  performance 
and  response  time.  An  in-depth  study  of  real-time  computing  can  be  found  in  [3]. 

3  Dependable  System  Classes 

In  previous  work,  fault-tolerant  real-time  systems  were  dittingniaTix!  by  tb^r  rehabihty  and  dead- 
Ime  requirements.  This  often  leads  to  confusion  during  the  design  stage  because  the  target  system 
reliabihty  range  and  deadlines  (hard  or  soft)  are  not  sufficient  to  further  determine  fault-tolerant  or 
real-time  system  mquirements.  In  this  section,  we  use  the  additional  parameters  of  response 
and  recovery  time  to  identify  general  classes  of  suitable  systems.  Other  system  features  are  thf" 
irafhcit  in  the  class. 

I<urge  markets  with  specific  demands  have  spawned  systems  targeted  for  several  types  of  de¬ 
pendable  computing  appUcations,  such  as  telqihone  service,  on-line  transaction  processing,  and 
real-time  control.  Each  of  these  systems  is  required  to  provide  dependable  service,  but  with  very 
different  rehabilities  and  response  times.  The  discussion  below  is  based  on  three  different  graphs. 
The  upper  graph  of  Figure  2  shows  the  application  space  with  respect  to  the  attributes  of  reli¬ 
ability  and  response  time.  The  lower  graph  of  Figure  2  shows  suitable  recovery  techniques  with 
respect  to  reliability  and  recovery  time.  These  graphs  are  combined  in  Figure  3  to  yield  a  composite 
dependable  system  overview. 

In  Figure  2,  we  have  highlighted  three  separate  system  classes:  Highly  Available  Systems 
(HAS),  Ultra-Dependable  Systems  (UDS),  and  Highly  Dependable  Systems  (HDS).  Each  regicm 
accommodates  applications  which  require  the  givm  levels  of  rdiability  and  response  time.  Wigb 
availability  systems  address  the  needs  of  on-line  transaction  processing  applications.  Banking,  sales, 
inventory  control,  and  tdephone  systems  must  be  available  ccmtmuously,  and  the  integrity  of  shared 
databases  must  be  maintained.  These  systems  are  distinguished  by  their  ability  to  tolerate  short 
down-times  in  the  range  of  minutes  to  hours.  Repair  or  recovery  from  faults  often  be  postponed 
until  predefined  maintenance  intervals,  without  loss  of  life  or  pri^erty.  Mission  times  for  HAS  are 
on  the  order  of  100  hours,  significantly  longer  those  of  other  systems. 

Highly  dependable  systems  emphasise  reliability  over  availability  because  particular  applications 
need  to  be  cmnpleted  without  interruption.  The  class  of  HDS  accommodates  applicatitms  that 
accqit  mission  failures  on  the  order  of  one  in  10*  to  10^  missions,  with  mission  times  between  10 
and  100  hours.  Since  a  physical  process  may  be  controlled,  typical  response  and  recovery  timif 
can  range  from  milliseconds  to  seconds.  System  applications  which  are  essential,  but  may  have 
badraps  or  permit  human  intervention,  are  typically  implemented  wing  HDS.  Many  submarine 
and  battleship  fire  control  applications  can  be  accommodated  by  HDS,  as  can  grmiTnrmii»«tjqin 
qrstems,  security  systems,  and  envinmmental  control  systems. 

Ultra-dependable  systems  represent  applications  where  loss  of  the  computer  system  is  unac- 
cqitable,  as  it  may  cause  loss  of  life  or  destroy  extremely  expensive  property.  Very  quick  response 
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axe  seeded  because  even  the  most  basic  application  functions  dqwnd  cn  the  cooqmtcr.  For 
ezanqde,  a  momentary  loss  of  computing  power  in  an  aircraft  fli^  controller  can  cause  the  plane 
to  become  and  leave  the  safe-flii^  cnvd(q>c.  The  rdidulity  requirements  are  eztremdy 

hi^  with  ftihire  probabilities  in  the  range  of  to  10~*  for  mission  times  of  ten  hours  or  less. 
The  acceptaUe  recovery  times  are  so  small  that  e:q>enaive  resource  rqplication  and  forward  error 
recovery*  methods  must  be  used  to  mask  faults  before  they  can  cause  system  faSnre.  Historically, 
fknlt-toleranoe  concerns  have  dominated  real-time  performance  concerns.  Applicatkiu  designers 
were  often  required  to  hand-craft  new  task  schedules,  even  when  only  minor  changes  occurred  in 
the  vrorkload,  increasing  the  potential  for  errors  in  the  system. 

The  lower  graph  of  Figure  2  shows  the  tradeoi&  between  rdial^ty  and  lectivery  time  require¬ 
ments.  Applications  that  emphasise  hi^  availability  over  long  mission  times  can  afford  slower  ftnlt 
recovery  times  than  applications  with  shorter  mission  times.  Applications  with  short  mission  times 
must  typically  react  to  the  plqrsical  environment.  Continuous  availability  is  expected  because  any 
intetnq>tion  in  service  can  be  catastrophic  if  it  is  not  immediately  recoverable.  When  very  fost 
recovery  times  are  needed,  fault  masking  must  be  used  as  there  is  no  time  to  interrupt  processing 
for  recovery.  Active  redundancy  management  policies  become  feasible  when  recovery  times  are  in 
the  range  of  milliseconds  to  seconds.  Finally,  more  time  consuming  policies,  such  as  and 

roQ-badc  or  reconfiguration  are  allowable  with  longer  recovery  times. 

The  above  observations  are  combined  in  Figure  3  to  provide  a  conqiosite  view  of  dependable 
system  diaracteristics.  We  have  noted  several  representative  systems  for  eadi  dass.  The  number  of 
prospective  applications  continues  to  increase  due  to  the  fast  dynamic  reqKmse  (ff  computers  and  the 
inherent  relidiility  of  integrated  digital  dectronics.  The  applications  can  be  further  sTamined  with 
respect  to  the  user’s  emphasis  on  different  characteristics  of  dq>endability,  specifically:  availability, 
safety,  and  fault  tolerance.  Distributed  approaches  to  these  problems  have  been  sdected  since  they 
posseu  many  appealing  characteristics,  such  as  avoidance  of  a  single  point  of  failure  and  the  ability 
to  physically  distance  resources.  These  classes  address  the  different  trends  envisioned  for  the  next 
generation  of  computing  resources  and  applications.  They  also  identify  nuny  intriguing  practical 
problems  and  research  tq;)ics  in  designing  and  evaluating  dependable  conq;>nting  systems  for  time 
critical  applications. 

4  Discussion 

In  tins  psper,  we  have  presented  three  classes  of  dependable  systems  suitable  for  different  sets  of 
application  requirements.  We  have  added  response  tim*  and  recovery  time  requirements  to  the  sys¬ 
tem  reliability  and  deadline  requirements  commonly  used  to  diaracterise  dependable  systems.  We 
divided  the  set  of  dependable  systenu  into  three  dasses:  hi^y  available  systems,  hi^y  depend¬ 
able  systems,  and  ultrardq>endable  systenu.  Then,  we  mapped  many  existing  system  q;>prG«dies 
into  this  framework.  WhOe  the  granularity  of  this  division  may  not  be  sufficient  tot  aU  critical 
real-time  applications,  it  provides  a  system  designer  with  a  guide  to  appropriate  fault-tolerance 
and  performance  enhancement  techniques. 

The  issues  of  concern  in  developing  dependable  systems  cover  all  phases  of  the  system  life-cyde. 
It  is  important  to  understand  that  the  payoff  assodated  vrith  fault  tolerance  may  be  i^iparent  only 

^Fonvaid  cmx  ncanxj  refets  to  tlw  nae  of  moaHng  techaiqaes  to  comet  aratem  opetstioaa  ia  the 

pwaencc  of  Aaha. 


2U 


212 


yAmk  the  coo^lete  life-^de  of  tho  syctem  i«  extuaiiiML  The  itnt^ies  fonmlated  dnxing  the  initUl 
detipii  rtege  heTe  a  tignificMit  iaapeet  on  the  reroltiBg  tyrtem  dq>€Bdability  end  coet.  Thiu.syttcm 
design  and  analysis  based  on  this  framework  can  ensnze  that  parameters  critical  to  the  application 
are  identified  eariy  in  the  system  design  stage.  While  the  devdopment  of  theoretical  and  practical 
approaches  to  dcpendalnUty  continues,  the  design  of  dq>endable  computers  still  remains  an  art. 
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ABSTRACT 

Real-time  databasea  QITDBS)  for  complex  systems  have  transactions  with  exitiicit  timing  con- 
strtims,  sudi  as  deadUnes.  Conventional  database  systems  are  not  designed  for  time-critical  tqjftiications 
and  ladt  featuies  retpdred  for  supporting  real-time  transactitms.  Meeting  the  requirements  of  RTDBS 
win  require  a  balanned  and  coordinated  effort  between  concurrency  cottrol  and  transaction  scheduling, 
bitiiispqier.  we  focn  on  two  issues:  pcedictabUity  and  serializaUlity.  One  approadi  is  to  combine  exist¬ 
ing  concurrency  ooonol  protocols  with  real-time  scheduling  algoritiuns.  To  meet  more  deadlines,  con¬ 
currency  control  piocoools  can  be  modified  to  favor  mote  urgem  transactions.  Anotiier  qiptoadi  is  to 
eqfiore  the  non-aerializable  semantics  in  real-time  i^caticms.  We  survey  and  discuss  maiqr  techniques 
tiiat  can'beused  to  design  and  inqilement  real-time  databases. 
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L  Introduction 

As  our  society  becomes  more  integrated  with  computer  technology,  infonnatioa  processing  for 
human,  activities  necessitates  distributed  and  fault-toterant  computing  that  responds  to  requesu  in  real¬ 
time.  Many  cmnputer  systems  are  now  used  to  monitor  and  control  physical  devices  and  lar;^  complex 
systems'  whidi  must  siqtport  real-time  o^tabili^.  Since  the  real  world  is  omstantly  evolving,  it  is  very 
important  for  system  designers  to  design  and  implement  real-time  systems  so  that  they  can  always  keq> 
iq>  with  foe  real  worid. 

Real-time  database  systems  (RTDBS)  have  (at  least  some)  transactions  with  etqdidt  timing  con¬ 
straints.  such  as  deadlines.  RTDBS  are  becoming  increasingly  important  in  a  wide  range  of  apfdications 
such  as  aero^tace  and  weapon  systems,  computer  iiuegrated  manufacturing,  robotics,  nuclear  power 
idants,  network  management,  and  traffic  control  systems.  Unfortunately,  conventimud  database  systems 
are  not  designed  for  time-critical  applications  and  lack  features  required  for  supporting  real-time  transac- 
tkms.  They  are  designed  to  provide  good  average  performance,  sfoile  possiUy  yielding  uttacoeptaUe 
worst-case  response  times.  It  has  been  generally  recognized  that  there  is  a  lack  of  basic  theory  for  real¬ 
time  database  systems  since  the  traditional  models  not  adequate  for  time-critical  applications  [AbbSS, 
Buc89.  Kor90.  Lin89.  LS9a  Raj89.  SRL88.  Sha91.  Son88.  Son88b.  Soii90.  Sot]91.  Son92.  Son93]. 

Typical  real-time  systems  tty  to  meet  the  timing  crmstraints  of  individual  raskr,  but  ignore  data  con¬ 
sistency  problems.  Task  and  transaction  abstractions  are  similar  in  foe  sense  that  b(^  ate  units  of  work 
as  well  as  units  of  scheduling.  However,  tasks  and  transactions  are  different  computational  concqjts  and 
their  differences  affea  how  foey  are  controlled.  In  real-time  task  scheduling,  it  is  usually  assumed  that  all 
tasks  are  preemptaUe.  Preemption  of  a  transaction  that  uses  a  file  resource  in  an  exclusive  mode  of  writ- 
irig  may  result  in  subsequent  transactions  reading  inconsistem  information.  In  addition,  vdiile  the  runtime 
behavior  of  a  task  is  statistically  predictable,  foe  behavior  of  a  transaction  is  dynamic,  making  it  difficult 
to  pttdia  its  execution  time  with  accuracy. 

Most  leal-fone  database  operafions  are  characterized  by  (1)  their  time  constrained  access  to  data, 
and  (2)  access  to  data  foat  has  temporal  validity.  These  operations  involve  gtfhning  dam  from  foe 
environment,  processing  the  gathered  information  in  foe  context  of  information  acquired  in  foe  past,  and 
providing  timely  responses.  The  qterations  also  involve  processing  not  only  archival  data  but  also  tem¬ 
poral  data  tifolch  loses  its  validity  after  a  certain  time  intervaL  Bofo  of  the  temporal  nature  of  foe  data 
and  foe  teqxmse  time  requirements  imposed  by  foe  environment  are  transactioo  timing  constraints  han¬ 
dled  by  dfoer  periods  or  deadlines.  Ihetefbre,  the  correctness  of  real-time  database  operation  depends  not 
only  on  foe  logical  oomputadmis  carried  out  but  also  on  foe  time  at  which  foe  results  are  delivered.  The 
goal  of  red-time  database  systems  is  to  meet  foe  timing  constraints  of  transaction. 

A  k^  point  to  note  here  is  foat  real-fone  computing  is  not  equivalent  to  fint  conqxiting.  Rafoer  than 
bdng  fast,  more  important  properties  of  RTDBS  should  be  foneliness,  U.,  foe  ability  to  produce  etqpected 
results  early  or  at  the  ri^  time,  and  predictability,  ix.,  foe  ability  to  ftmcdon  as  deterministic  as  neces¬ 
sary  to  satisfy  system  qtecifications  including  timing  constrairus  [Stan88].  Fast  computing  which  is  busy 
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doinig  the  wrong  activity  at  the  wrong  time  is  not  helpful  for  ieal>time  oxnputing.  While  the  objective  of 
letd-time  cmnputing  is  to  meet  the  individual  timing  constraint  of  eadi  activity,  the  objective  of  fast  com¬ 
puting  is  to  minimize  the  average  lesponse  time  of  a  given  set  of  activities.  Fast  computing  is  helpful  in 
meeting  stringent  timing  constraints,  but  fast  computing  alone  does  not  guarantee  timeliness  and  i»edicta- 
bili^.  In  order  to  guarantee  timeliness  and  predictability,  we  need  to  handle  explicit  timing  ocmstraints. 
and  10  use  time-cognizant  techniques  to  meet  deadlines  or  periodicity  associated  with  activities. 

RTDBS  have  some  very  unique  requirements.  The  design  and  imf^ementation  of  RTDBS  intro¬ 
duces  many  new  and  interesting  problems:  What  is  an  appropriate  model  for  real-time  transactions  and 
data?  What  are  the  language  constructs  that  can  be  used  to  specify  real-time  constraints?  What  are  the 
measures  of  system  predictability?  How  should  real-time  transactions  be  scheduled?  What  is  the  best 
concurrency  control  scheme  that  considers  real-time  constraints  and  importance  of  transactions?  Is  serial- 
izal^9  an  appropriate  correctness  criterion  for  RTDBS?  In  this  piq)er  we  try  to  answer  some  of  die 
questions  and  review  current  ai^roaches  to  the  design  of  RTDBS.  We  focus  on  two  important  issues: 
I»edictatMlity  and  serializability. 

The  remainder  of  diis  paper  is  organized  as  follows.  In  Section  2.  we  first  describe  some  of  the 
diaracterisdcs  and  requirements  of  RTDBS,  and  discuss  about  issues  involved  with  schedulabOity,  pred- 
ictalnlity.  and  non-serializable  executions.  In  Section  3.  we  discuss  i»iotity-based  crmflia  resolution 
mechanisms  and  review  some  of  the  algorithms  that  are  based  on  serializability.  Section  4  presents  tech¬ 
niques  for  generating  a  set  of  schedules  that  are  non-serializable  but  acceptable  to  RTDBS.  Finally,  con¬ 
cluding  remaiits  with  future  research  issues  are  summarized  in  Section  S. 

2.  RTDB  Characteristics  and  Requirements 

The  reasons  why  conventional  database  systems  are  not  used  in  real-time  applications  include  their 
poor  performance  and  thdr  lack  of  predictability.  In  conventional  database  systems,  transaction  process¬ 
ing  requires  access  to  a  database  stored  on  secondary  storage:  dius  transaction  reqionse  time  is  affected 
1^  disk  access  delays,  which  can  be  in  die  order  of  milliseconds.  Although  these  databases  are  fast 
enough  for  traditional  applicaticms  in  which  a  response  time  of  a  few  seconds  is  often  acceptable,  these 
systems  may  not  be  able  to  provide  a  response  fast  enough  for  high-performance  real-time  apfdications. 
One  common  qiproadi  to  achieve  high  performance  is  to  replace  slow  devices  (e.g.,  a  didt)  by  a  hi^ 
speed  devices  Ce.g.,  a  large  main  memory).  Another  alternative  is  to  use  special  techniques  to  increase 
the  degree  of  concunency  [SonSSb]. 

Since  real-time  systems  are  often  used  in  safety-critical  qiplications,  they  must  provide  predictable 
performance.  An  uiqiredictable  system  can  do  more  harm  than  good  under  abnormal  oonditkms.  There 
are  many  reasons  if^y  traditional  database  ^sterns  may  have  uiqiredictable  performance..  Fbr  examide. 
to  oisure  die  data  consistency,  traditional  database  systems  often  block  certain  transactkms  fiom  reading 
or  updating  data  if  these  data  are  locked  by  other  transactions.  Blocking  will  cause  transacdons  to  be 
delayed.  Even  worse,  it  is  often  difficult  for  a  transaction  to  predict  how  long  die  delay  anil  be  since  the 
blocking  transactions  themselves  in  turn  may  be  blocked  by  other  transactions.  Consequendy,  the 
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lespoose  time  for  a  transaction  in  conventional  daubase  systems  is  often  uiqHedictaUe. 

Moreover,  databases  in  many  real-time  systems  have  the  following  unique  pioldems: 

(1)  '  Many  dau  objects  in  a  database  correspond  to  active  data  objects  in  the  real  woild.  Their  values 

may  change  by  themselves,  regardless  of  the  database  state  and  activities. 

(2)  A  real-time  database  may  never  be  completely  correct.  As  soon  as  a  real-world  value  is  recorded 
in  the  database,  it  may  already  be  out  of  date. 

(3)  Oilfetent  data  objects  in  a  database  may  be  recorded  tt  different  rates.  Their  values  may  not  co¬ 
exist  in  the  same  real-worid  sna|)shot 

Due  to  these  problems.  RTDBS  need  special  protocols  to  ensure  that  all  transactions  executed  are  neces¬ 
sary  and  useful.  A  transaction  execution  is  necessary  only  if  it  can  help  other  more  critical  transactions 
produce  correct  results  (both  in  terms  of  time  and  value).  A  transaction  executimi  is  useful  only  if  its 
result  can  be  applied  to  the  system  without  producing  any  adverse  effect  to  the  system  mission.  Toward 
these  goals,  an  execution  using  some  traditional  protocol  may  xiot  be  acceptable  to  RTDBS,  while 
RTDBS  may  accept  some  transaction  executions  using  unconventional  measures. 

In  the  following,  we  compare  real-time  database  systems  with  (Xher  types  of  databases  such  as 
CAD/CAM  databases  and  business  databases  (Table  1).  From  these  comparisons,  we  may  have  a  better 
jncture  of  die  true  requirements  of  real-time  database  systems. 


Table  1.  Database  Characteristics 


Applications 

Data 

Characteristics 

Transaction 

Characteristics 

Real-Dme 

Systems 

Device  ir^xits 

System  and  machine  state 
System  history/statistics 
Multi-media 

Temporal  attributes 
(^lality  attributes 

Periodic,  short 

Event-driven 

Producer/consumer 

Frequent  updates 

Decision  inaldng 
Associative  (or  no)  search 

Schedulability 

Hard  deadline 

Graceful  (fegradation 
Syston  reliability 
availability 

Best  effort 

CAD/CAM 

Design  data 

Oomidex  3-D  structure 
Hienrchical 

Persistent  objects 

Version  comtol 

Special  omcurrency 
Friendly  user  intmfoces 
FlexiUe  manfoulttion 

Rnandal 

Systems 

Large  volume 

Sim{de  otjects 

Flat  structure 

Numeric  or  text 

Large  taNe  search 

Complex  qirety 

Storage  intenrive 

Atomic  actions 
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XL  DaU  Characteristics 

Since  real-time  systems  are  used  to  monitor  and  to  control  physical  devices,  they  need  to  store  a 
large  ipount  of  information  about  their  environments.  Such  infoimation  includes  iqwt  data  from  dev¬ 
ices  as  well  as  system  and  machine  states.  In  addition,  many  embedded  systems  must  also  store  the  sys¬ 
tem  executitm  histoiy  for  maintenance  or  error  recovery  purposes.  Some  syuems  may  also  keq>  track  of 
some  system  statistics  like  average  system  load  or  average  device  temperature.  Depending  on  die  qifdi- 
cadons.  real-time  systems  may  have  to  handle  muld-media  infonnadon  like  audio  (ior  sonar  devices), 
grapidcs  (for  radar  devices),  and  images  (for  robots).  Since  systems  are  ccmstandy  recording  infonnadon. 
data  must  have  their  temporal  attributes  recorded.  Also,  some  iiqiut  devices  nuy  be  subject  to  noise 
degradadon  and  need  to  record  the  quality  of  the  attributes  along  with  the  data. 

Often  a  significant  pordon  of  a  real-dme  database  is  highly  perishaUe  in  the  sense  that  it  may  con¬ 
tribute  to  a  mission  only  if  used  in  time.  In  addition  to  deadlines,  therefore,  odier  kinds  of  timing  con¬ 
straints  could  be  associated  with  data  in  RTDBS.  For  example,  each  sensor  input  could  be  indexed  by  the 
time  at  udiich  it  was  Ukea  Once  entered  into  the  databare.  data  may  become  out-of-date  if  it  is  not 
updated  udthin  a  certain  period  of  time.  To  quantify  this  notion  of  "age",  dau  may  be  associated  with  a 
valid  Uuerval  (LiuSS,  SL92].  Data  outside  its  valid  interval  does  not  represent  the  current  state.  What 
occurs  vAtea  a  transaction  attempts  to  access  data  outside  its  valid  interval  depends  on  the  semantics  of 
dau  and  die  particular  system  requirements. 

fti  comparison,  dau  in  CAD/CAM  a^lications  are  mostly  related  to  design  infonnation.  They  often 
have  complex  and  hierarchical  structures.  Dau  in  financial  systems,  on  the  other  hand,  are  mudi  simpler 
and  flat  in  structure.  However,  financial  systems  usually  have  a  large  volume  of  dau  to  be  processed. 

XX  Transaction  Characteristics 

Transactions  in  real-time  database  systems  can  be  caregorized  as  hard  and  sq/1  tiansactkms.  We 
define  hard  real-time  transactitms  as  those  transactions  whose  timing  constrainu  must  be  guaranteed. 
Missing  the  deadlines  of  this  fype  of  transactions  may  result  in  catastioiMc  consequences,  bi  contrast, 
soft  real-time  transactions  have  timing  constraints,  but  there  may  still  be  some  justification  in  completing 
the  transactions  after  its  deadline.  Catastrophic  consequences  do  not  result  if  soft  real-time  transactions 
miss  their  Soft  real-time  transactions  are  scheduled  taking  their  timing  requiiemems  into 

account,  but  they  are  riot  guaranteed  to  make  their  deadlines.  There  are  many  real-time  sysums  that  need 
database  support  fior  bodt  types  of  transactions. 

Many  transactions  in  real-time  database  systems  are  used  to  record  device  reading  or  to  tumdle  sys¬ 
tem  events.  They  are  either  periodic  or  event  drivea  Most  of  the  transactions  are  short  since  titey  must  be 
reqxmsive  to  their  environment  Transactions  can  be  viewed  eidier  as  producers  or  consumers  for  certain 
itqnit  data.  For  periodic  transactions,  most  will  perform  updates  in  every  period.  Ibe  transactitms  trig¬ 
gered  events  must  often  make  dedsions.  Due  to  the  real-time  constraints,  either  no  search  or  associa¬ 
tive  search  mechanism  is  preferred. 
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In  comparison.  CAD/CAM  transactions  are  more  iterative  in  structure  and  longer  in  duratimis.  They  \ 

need  to  share  infonnation  among  group  users.  Also,  many  frequent  updates  will  be  conducted  before  a 
design  is  finalized.  For  financial  applications,  the  queries  requested  could  be  quite  comi^  and  require 
long  searches  throu^  large  tables.  Financial  transactions  therefore  require  fast  high-volume  storage  dev¬ 
ices  to  ^leed  up  their  operations.  To  prevent  data  inconsistency,  firumcial  transactimis  usually  are  exe¬ 
cuted  as  atomic  actions. 

23.  Performance  Requirements 

For  CAO/CAM  applications,  systems  must  control  the  versions  evolved  during  the  design  process. 

Due  to  the  cooperation  among  team  users,  special  mechanisms  must  be  provided  to  facilitate  concurretu 
access  from  difTeient  users.  A  friendly  user  interface  and  flexible  manipulation  primitives  are  deflnitely 
desirable.  For  financial  applications,  the  most  important  performance  criterion  is  the  control  of  con- 
currertt  transactions  in  order  to  ensure  that  the  effect  of  concurrent  executions  is  equivalent  to  the  effea 
those  transactions  would  have  had.  if  they  had  been  run  in  some  serial  order.  This  well-known  serializa- 
bility  goal  in  transaction  processing  is  well  established  as  an  ai^iopriate  notion  of  correctness  for  conven¬ 
tional  transaction  scheduling.  In  contrast  to  real-time  systems,  conventional  database  systems  do  not 
emi^ize  the  notion  of  timing  constraints  or  deadlines  for  transactions.  The  performance  goal  is  tc 
reduce  the  response  times  of  transactions  by  using  a  serialization  order  among  conflicting  transactions. 

For  examine,  the  most  commonly  used  two-phase  locking  (2PL)  protocol  [Bem87]  synchronizes  con¬ 
current  data  access  of  transactions  by  blocking  and  thus  may  violate  the  timing  constraints  of  transac¬ 
tions. 

The  most  important  reqiurement  for  real-time  rqrplications  is  to  provide  a  feasible  schedule  so  tiiat 
transactions  can  meet  their  hard  deadlines.  Systems  also  must  degrade  gracefully  since  their  iqrplications 
ate  often  safety-criticaL  System  reliability  is  definitely  an  important  issue  for  these  systems.  To  make 
fast  and  cotrea  decisions,  data  availability  is  critical  to  system  perfonnance.  Also,  due  to  timing  con¬ 
straints,  many  decision-making  processes  can  only  use  the  best-effort  rqrproach;  diey  must  stop  at  die 
deadlines.  In  RTDBS,  the  timeliness  of  a  transactiem  is  usually  combined  with  its  criticality  to  calculate 
the  priority  of  the  transaction.  Therefore,  proper  rnaruigemem  of  priorities  and  conflict  resolution  in  real¬ 
time  transaction  scheduling  is  essential  for  insuring  the  predictability  and  responsiveness  of  RTDBS. 

2.4.  Ismies  on  Rredictability  and  SerializabOity 

One  of  die  most  interesting  challenges  in  RTDBS  is  the  creation  of  a  unified  theory  for  real-time 
schedtding  and  concurrency  control  that  maximizes  bodi  concurrency  and  resource  utilizatitm  subject  to 
tiiree  constraints:  data  consistency,  transacdon  correctness,  and  transaction  deadlines  [Stan88].  While 
the  theories  of  concurrency  control  in  database  systems  and  real-time  task  sdiedulihg  have  both 
advanced,  difficult  problems  remain  in  the  interaction  between  concurrency  control  protocols  and  real¬ 
time  scheduling  algorithms.  In  database  concurrency  control,  die  objective  is  to  provide  a  high  degree  of 
concurrency  and  thus  faster  average  response  time  without  violating  data  consistency.  In  real-time 
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sdwduling,  it  is  deanble  to  maximize  resources  usage,  such  as  CPU  utilization,  subjea  to  meeting  tim- 
iitg  constraints.  If  the  system  is  not  desipied  property,  it  may  be  impossible  to  meet  both  objectives:  a 
transaction  may  need  to  be  blocked  if  it  is  in  conflict  with  other  transaction,  yet  it  must  be  executed 
inunediately  to  meet  its  deadline. 

The  first  issue  we  want  to  address  is  the  predictability  of  real-time  database  systems.  As  stated 
above,  the  goal  of  scheduling  in  RTDBS  is  to  meet  timing  constraints.  Many  real-time  task  scheduling 
methods  can  be  extended  to  real-time  transaction  scheduling,  while  concurrency  control  protocols  are  still 
used  to  maint^  data  consistency.  The  general  ^rproach  is  to  utilize  existing  concurrency  ctmtrol  proto¬ 
cols,  especially  two-i4iase  locking,  and  to  sf^ly  time-critical  transaction  scheduling  m^hods  that/ovor 
more  urgent  transactions  [Son89].  Such  approaches  have  the  inherent  disadvantage  of  being  limited  by 
the  concurrency  control  protocol  upon  which  they  depend,  since  all  existing  concurrency  control  proto¬ 
cols  synchronize  concurrent  data  access  of  transactions  by  a  combination  of  two  measures:  Uoddng  and 
roll-backs  of  transactions.  Both  are  barriers  to  meeting  time-critical  schedules.  Several  recent  projects 
have  tried  to  integrate  ^real-time  constraints  with  database  technology  to  facilitate  effident  and  correct 
management  of  riming  constraints  in  RTDBS.  In  Section  3,  we  will  discuss  some  of  the  projects  and 
compare  the  results. 

The  second  issue  that  we  will  discuss  is  the  serializability  of  RTDBS.  Traditional  concurrency  con¬ 
trol  protocols  induce  a  serializarion  order  among  conflicting  transactions.  Although  in  some  applications 
weaker  consistency  is  acceptable  [Gat83].  a  general-purpose  consistency  criterion  that  is  less  stringent 
than  serializability  has  been  difficult  to  find.  However,  RTDBS  may  have  a  differem  notion  of  “correct 
execution”  in  transaction  processing.  Based  on  the  argument  that  timing  constraints  may  be  more  impor¬ 
tant  than  dau  consistency  in  RTDBS,  attempts  have  been  made  to  satisfy  timing  constraints  sacrificing 
database  consistency  temporarily  to  some  degree  [Lin89,  Vtb88].  It  is  based  on  a  new  consistency  model 
of  real-time  databases,  in  which  maintairung  external  dam  consistency  (values  of  data  objects  represem 
correa  values  of  external  world  outride  the  database)  has  priority  over  maintaining  internal  dam  con¬ 
sistency  (no  data  that  violates  consistency  constraints).  Moreover,  a  real-time  trarrsaction  may  indude  a 
temporal  consistency  requirement  that  spedfies  tire  validity  of  data  values  accessed  the  transaction. 
'While  a  deadline  can  be  thought  of  as  providing  a  time  interval  as  a  constraim  in  die  future,  temporal  ctm- 
ristency  ^dfies  a  temporal  wirvlow  as  a  constraint  in  the  past  As  long  as  the  temporal  ctmristency 
requirement  of  a  truisaction  can  be  satisfied,  the  system  may  want  to  provide  an  answer  using  available 
(may  rx>t  be  serializable)  information.  In  Section  4,  we  investigate  several  of  the  protocols  which  provide 
noo-traditional  0.e.  non-serializable)  transaction  schedules  that  are  acceptable  in  RTDBS. 

3.  Serializable  Execution  of  Real-Time  Transactions 

Conventional  transaction  prxessing  requires  controlling  tire  concurrent  execution  of  transactions  to 
ensure  serializalnlity.  However,  the  notion  of  serializabillQr  may  be  too  restrictive  for  real-time  transac¬ 
tions.  In  real-time  database  systems,  guaranteeing  temporal  ccmristency  requirements  might  be  more  criti¬ 
cal  than  satisfying  the  conventional  notion  of  serializability. 
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In  this  section,  we  teview  some  of  the  (»oposed  scheduling  and  concunency  courol  algorithms, 
based  on  serializatnlity,  for  real-time  transactions.  To  enforce  serializaMity,  they  use  conventional 
schedMiing  and  concurrency  control  schemes  such  as  two-phase  locking  (2PL),  timestamp  wdering  (TO), 
and  optimistic  concurrency  control  (OCQ  as  their  basis.  They  combine  those  oonventxMial  schemes  with 
priority-based  conflia  resolution  mechanism  such  as  priority  abort  [Abb88.  Abb92,  Hua89.  Hua91]. 
priority  inheritance  [Sha91,  Hua91].  priority  wait  [Hua91,  Hai90.  Hai91],  and  at^usting  serialitation 
order  [LS90.  SGn92].  Schemes  that  are  based  on  die  correcmess  criterion  different  firom  serializability 
will  be  discussed  in  Section  4. 

5.1.  Locking-based  Conflict  Resolution 

Concurrency  control  protocols  induce  a  serialization  order  among  conflicting  transactions.  For  a 
concunency  control  protocol  to  accommodate  timing  constraints  of  transactions,  the  serializaritm  order  it 
produces  should  reflect  the  priority  of  transactions.  However,  this  is  often  hindered  by  the  past  executitm 
hisuity  of  transactions.^  A  higher  priority  transaction  may  have  no  way  to  precede  a  lower  priority  tran¬ 
saction  in  the  serialization  order  due  to  previous  conflicts.  For  example,  let  7;/  and  Tx,  be  two  transac¬ 
tions  with  Th  having  a  hi^r  priority.  If  Tx,  writes  a  data  objea  x  before  Th  reads  it,  then  the  serializa¬ 
tion  Older  between  Th  and  7x,  is  determined  as  7x,  Th.  Th  can  never  precede  7x,  in  the  serialization 
order  as  long  as  both  reside  in  the  execidion  history.  Most  of  the  current  (real-time)  concurrency  control 
protocols  resolve  this  conflia  either  by  blocking  Th  until  Ti  releases  the  write  lock  or  by  aborting  fx,  in 
flivor  of  the  higher  priority  transaction  Th  .  Blocking  of  a  higher  priority  transaction  due  to  a  lower  prior¬ 
ity  transaction  is  contrary  to  the  requirement  of  real-time  scheduling.  Aborting  is  also  not  desirable 
because  it  degrades  the  system  performance  and  may  lead  to  violations  of  timing  constraints.  Fiirtiier- 
more,  some  aborts  can  be  wasteful  when  the  transaction  which  caused  the  abort  is  aborted  due  to  another 
ctmflicL 

Abbott  and  Garcia-Molina  have  proposed  a  restart-based  2PL  [Abb88,  Abb92].  It  incorporates 
priority  information  in  lock  setting  so  that  transactions  with  higher  priority  will  be  given  a  preference, 
^^flienever  a  higher  priority  transaction  is  in  conflia  with  a  lower  priority  transaction,  the  lower  priority 
transaction  vrill  be  aborted  and  restarted  later  on.  One  of  the  weaknesses  of  this  scheme  is  the  impaa  of 
restarts  on  scheduling  other  transactions  to  meet  their  timing  constraints.  Restarting  a  transaction  could  be 
very  costly  in  terms  of  wasted  resources,  and  a  large  number  of  restarts  increase  the  workload  of  the  sys¬ 
tem  and  may  cause  otiier  transactions  to  miss  their  deadlines. 

To  reduce  the  number  of  restarts,  the  conditional  restart  protocol  is  proposed  by  die  same  authors.  In 
tltis  protocol,  the  lower  priority  transaction  will  have  to  be  restarted  only  if  the  sladc  time  of  die  hi^r 
priority  transaction  is  smaller  than  the  remaining  execution  time  of  the  lower  priority  transaction  that 
holds  die  lock.  There  are  a  few  problems  \rith  this  protocol  First,  the  effectiveness  of  dds  checking  is 
greatiy  affected  by  the  blocking  probatdlity  of  the  lower  priority  transaction.  Second,  die  scheduler  must 
have  information  such  as  the  execution  time  and  slack  time.  In  real-time  database  systems,  such  informa¬ 
tion  is  hard  to  ga  due  to  the  dynamic  nature  of  resource  demands  and  die  data-dependem  execution  path 
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of  transactims.  Ruthennore,  priority  inversion  and  deadlodc  is  still  possiUe.  although  diey  have  a  ksser 
degree  of  impact 

Huang  et  aL  developed  and  evaluated  a  group  of  protocols  for  real-time  transacdoos  [Hua89. 
Hua91j|.  Hieir  study  includes  protocols  for  Q*U  scheduling,  data  conflia  resolution,  deadlock  resolution, 
transacrioo  wakeup,  disk  scheduling,  and  transaction  restart  In  terms  of  conflia  resolution,  diey  com¬ 
pared  three  approaches: 

(1)  priority  inheritance  uddch  eliminates  the  problem  of  CPU  blocking  (not  data  blocking)  and 
attempts  to  reduce  the  period  of  priority  inversion  by  allowing  the  low  priority  transaction  hcfld- 
irtg  the  lock  to  execute  at  the  priority  of  the  hi^iest  priority  transaction  waiting  for  the  lock, 

(2)  iffiority  abort  w  ch  completely  eliminates  the  problem  of  priority  inversum  as  well  as  GPU 
Uoddrig  by  aboiting  the  low  priority  transaction. 

(3)  conditional  priority  inheritance  which  is  a  compromise  between  the  two.  taking  the  remaining 
execution  time  of  the  low  priority  transaction  into  consideratirm. 

Thdr  performance  soidy  is  not  based  on  simulation  but  on  acnial  implementation  of  those  protocols 
(m  a  real-time  database  testbed  called  RT-CARAT.  Their  results  indicate  fliat  CPU  schedulirtg  proocols 
have  a  sigiuficam  impaa  on  the  performance  of  real-time  databases.  They  also  found  diat  the  overhead 
incurred  in  locking  is  non-negligible  and  hence  must  tut  be  ignored  in  the  analysis  of  real-time  transac¬ 
tion  processing.  For  conflia  resolution,  their  performance  results  indicate  that  wifli  respea  to  deadline 
guarantee  ratio,  the  priority  inheritance  scheme  does  not  work  well,  while  conditional  priori^  irflieritance 
and  priority  abort  schemes  perform  rather  well  for  a  wide  range  of  system  workloads.  They  clarified 
throu^  experiments  that  the  priority  inheritance  scheme  is  quite  sensitive  to  the  priority  inheritance 
period.  A  long  priority  inheritance  period  will  affea  not  cmly  the  blocked  higher  priority  transactions  but 
also  odier  concurrem  non-Uocked  higher  priority  transactions.  The  conditional  priori^  irflieritance 
scheme  works  well  because  of  its  reduced  priori^  inheritance  period. 

3.2.  Optimistic  Approadics 

Recentiy,  real-time  concurrency  control  protocols  based  on  the  optimistic  qipoadi  have  been  pro¬ 
posed  and  studied  (Hai90,  Hua91b.  Son91].  Optimistic  concurrency  control  (OCQ  eiqiloits  die  low  pro¬ 
bability  of  data  conflicts  [KunSlJ.  It  is  a  non-blocking  protocol:  the  OCC  sdieduler  uses  abort  and  restart 
to  serialize  concurrent  data  operations,  thereby  avoiding  blocking.  Subsequently.  OCC  is  tree  fiom 
deadlock,  hi  addition,  it  has  a  potential  for  a  high  degree  of  paralldism.  These  features  of  OCC  make  it 
promishig  particularty  for  real-time  transaction  processing  However,  the  abort-based  conflia  resolution 
of  OCC  has  tile  problem  of  wasted  resources  and  time. 

hi  OCC.  write  requests  issued  by  transactions  are  not  immediately  processed  m  data  objects  but  are 
deferred  until  the  transaction  submits  a  commit  request,  u  which  time  tiie  transaction  must  go  tiirough  tiie 
validation  phase.  Because  write  operations  etiectivdy  occur  at  commit  time,  die  serialization  order 
selected  by  an  OCC  protocol  is  die  order  in  which  the  transacticms  actually  commit  tiirough  the  validatitMi 
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phase.  Transacdoo  validation  can  be  peifonned  in  one  of  two  ways:  forward  validatUm  w  backward 
vaiidatUm. 

]p  OCC  i»Dtoools  that  peifonn  backward  validation,  the  validating  transaction  either  or 

aborts  ttepending  on  whether  it  has  conflicts  with  transactions  that  have  already  coaunioed.  Thus,  this 
validatibn  scheme  does  not  allow  us  to  take  transaction  characteristics  iitto  account  In  fwward  validation 
(Hae84],  however,  either  the  validating  transaction  or  conflicting  tmgoing  transactions  can  be  aborted  to 
resolve  conflicts.  This  validation  scheme  is  advantageous  in  real-time  database  systems,  because  it  may 
be  preferable  not  to  commit  the  validating  transaction,  depending  tm  the  timing  characteristics  of  the  vali¬ 
dating  transaction  and  the  conflicting  ongoing  transactions.  A  number  of  real-time  concurrency  control 
metiiods  based  on  OCC  usirtg  forward  validation  scheme  have  been  studied  [Hat90.  Hua9ib,  Son92]. 

Haritsa  et  aL  pioposed  a  group  of  optimistic  real-time  concurrency  control  protocols,  based  on  for¬ 
ward  <validation,  and  evaluated  tiiem  on  a  simulation  model  [Hat90].  When  a  conflict  is  detected  during 
validation,  the  priorities  of  the  conflicting  transactions  ate  examined,  and  their  fate  is  determined  accord¬ 
ing  to  the  algorithm  being  used.  If  the  validating  transaction  has  a  priority  hi|her  than  all  of  the  transac¬ 
tions  witii  whidi  it  conflicts,  the  validating  transaction  will  commit,  and  all  the  conflicting  ones  will 
jfoorL  However,  if  smne  of  the  conflicting  transacticms  have  higher  priority,  the  system  can  choose  one  of 
tire  following  options: 

(1)  OPT-Sacrifice:  if  at  least  one  of  die  conflicting  transactions  has  hi^r  priority,  then  the  validating 
transaction  is  aborted. 

(2)  OFT-Gommit:  the  validating  transaction  is  always  committed. 

(3)  OPT-Wait:  it  incorporates  the  priority  w<dt  mechanism  sudi  that  the  validating  low  priority  tran¬ 
saction  will  wait  for  the  completion  of  conflicting  high  priority  transactions. 

The  Wait-SO  algorithm  b  an  interesting  extensitm  of  Opt-Wait  it  incorporates  a  Hvu’r  control 
medianism.  In  Wait-50,  a  simple  "50  percent  rule*  is  used,  in  nriiich  the  validating  transaction  is  made  to 
wait  vdiile  more  than  half  of  the  conflicting  transactions  have  hi^ier  priority.  Once  drat  state  is  reached, 
remaining  conflicting  transactions  are  aborted,  ineqrective  of  their  priorities  and  the  validating  transac¬ 
tion  is  committed.  The  goal  of  the  wait  control  mechanism  is  to  detea  when  the  benefidal  efibcts  of  wait- 
ing,  in  terms  of  giving  preference  to  higher  priority  transactiois,  ate  outwei^ited  by  its  dtawbadcs,  in 
terms  of  late  restarts  and  an  increased  number  of  conflicts.  In  other  words,  it  tries  to  avoid  die  loss  of 
work  already  accompMshed  by  die  validating  transaction.  It  is  a  compromiring  strategy  in  die  sense  that 
we  carl  control  the  amount  of  waiting  based  on  transaction  conflia  states.  Hieir  simulation  study  shows  a 
significant  performance  gains  fiom  Wait-50  over  other  choices. 

Haritsa  a  aL  have  also  coiducted  a  study  on  the  relative  performance  of  locking-based  protocols 
and  iqrtimistic  protocols,  and  omcluded  diat  OCC  protocols  ouqrerfotm  two-phase  loddng-based  proto¬ 
cols  over  a  wide  range  of  system  utilization.  Huang  a  aL  also  conducted  a  similar  performance  study  of 
real-time  OCC  protocols,  but  on  a  testbed  system,  not  through  simulation  [Hua91b].  They  examined  the 
overall  effects  and  the  impaa  of  the  overheads  involved  in  implementing  real-time  OCC  on  the  testbed 
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system.  Their  experimental  results  ccntiast  with  the  results  in  [Hai90],  riiowing  that  OCC  may  not 
always  outperfoim  a  two-phase  locking-based  protocol  which  aborts  the  lower  priority  transaction  when  a 
conflict  occurs.  They  pointed  out  that  physical  implementation  schemes  may  have  a  signiflcant  on 
the  poformance  of  real-time  OCC  protocol. 

The  rationale  for  OCC  is  based  on  an  "optimistic"  assumption  regarding  run-time  conflicts:  if  only 
few  run-time  conflicts  are  expected,  we  can  assume  that  most  execution  is  serializable  [Bem87].  There¬ 
fore  OCC  simultaneously  avoids  Mocking  and  restarts  in  the  optimistic  situations.  Unfortunately,  how¬ 
ever,  tfiis  optimistic  assumption  on  transaction  behavior  may  not  always  be  true  in  real  world  Mtuariornt 
In  a  database  system  where  run-time  conflicts  are  not  rare.  OCC  depends  cm  transaction  restarts  to  elim¬ 
inate  nonserializable  executions.  The  adverse  effect  of  transaction  testaits  for  serialization  is  that 
resources  and  time  are  wasted.  In  OCC.  because  data  conflicts  are  detected  and  resolved  only  during  the 
validation  phase,  a  transaction  can  end  up  aborting  alter  having  used  most  of  the  resources  and  time 
needed  fat  its  execution.  When  the  transaction  is  restarted,  previously  performed  work  has  to  be  redone. 
This  problem  of  time  and  resource  waste  becomes  even  more  serious  in  real-time  transaction  «ehe<tuiing, 
because  it  reduces  the  chances  of  meeting  transaction  deadlines. 

Aiwflier  inoblem  of  OCC  is  that  of  unnecessary  aboits.  This  problem  is  often  caused  by  the  imper¬ 
fect  validation  tests  used  in  OCC  protocols.  Many  validation  test  schemes  are  based  on  the  intersection  of 
the  read  sets  and  write  sets  of  transactions  rather  than  on  flie  actual  execution  order  of  transactions,  since 
in  general  it  is  difficult  to  record  and  use  entire  execution  history  efficiently.  Hence  sometimes  a  valida- 
titm  process  using  read  sets  and  write  sets  erroneously  concludes  that  a  nonserializable  execution  has 
occurred  udien  it  has  not  in  actual  executioa  Such  a  conflia  can  be  called  a  virtual  conflict.  A  virtual 
conflict  leads  to  one  or  more  unnecessary  transaction  aborts.  This  problem  of  unnecessary  aborts  also 
results  in  waste  of  resource  and  time,  and  is  serious  in  real-time  transaction  processing. 

3J.  Dynamic  Adjustment  of  Serializability 

Son  et  al.  proposed  the  concefa  of  dynamic  adjustment  of  serialization  order  to  provide  better  ser¬ 
vice  to  high  priori^  transactions  [LS90.  Son92].  Transactions  write  into  the  datebase  only  after  they  are 
committed.  By  using  a  priority-dependent  locking  inotocol.  the  serialization  order  of  active  transacdons 
is  adjusted  dynamically,  making  it  possible  for  transactions  with  higher  priorides  to  be  executed  first  so 
that  hitler  prioriQr  transacdons  ate  not  blocked  by  uncommitted  tower  priority  transacdtms,  udifle  tower 
priority  transactions  may  not  have  to  be  aborted  even  in  face  of  conflicting  operadons.  The  adjustment  of 
die  seriallzadoa  order  can  be  viewed  as  a  mechanism  to  support  dme-cridcal  sdieduling.  The  objecdve 
of  this  protocol  is  to  avoid  unnecessaty  blocking  and  aborting. 

The  protocol  is  rimilar  to  opdmisdc  concurrency  control  (OCQ  in  the  sense  diat  each  transacdon 
has  flitee  jfluses,  but  unlike  the  opdmisdc  method,  there  is  no  validadon  phase.  This  protocors  three 
ifluues  are  read.  wait,  and  write.  The  read  {fliase  is  rimilar  to  that  of  OCC  wherein  a  transacdon  reads 
from  the  database  and  writes  to  its  local  workspace.  In  this  {fluse.  however,  conflicts  are  also  resolved  by 
using  die  transacdons  priority.  While  other  opdmisdc  teal-dme  concurrency  control  protocols  resolve 


225 


conflicts  in  die  vaUdadmi  phase,  this  protocol  resolves  them  in  the  read  phase.  In  the  wait  phase,  a  tm- 
sacdon  waits  for  its  chance  to  commit  Finally,  in  the  write  phase,  updates  are  made  petinanenc  to  the 
database.  The  simulation  study  in  [Son92c]  indicates  that  this  protocol  offers  a  sifniflcant  perfommoe 
improvement  over  2PL  with  priority  abort  (also  called  high  priority  schme  in  [AbbSfl]).  One  of  the  main 
reasons  fbr  the  improved  performance  is  the  reduced  number  of  "useless  restarts"  and  "umecessary 
abuts". 

4.  Non-Scrializablc  Real-Time  Transactions 

As  we  have  discussed  earlier,  serializability  may  not  be  necessary  for  real-time  transactions.  To 
facilitate  more  timely  executions  uduch  meet  their  deadlines,  we  may  extend  the  definition  of  correctness 
in  database  transactions.  Since  real-time  systems  are  used  to  reqxmd  to  external  stimuli  (e.g.  m  combat 
systems)  or  to  control  physical  devices  (e.g.  in  auto-ftilot  systems),  a  timely  and  useftil  result  is  much 
more  deniable  than  a  serializable  but  out-of-date  response.  As  long  as  the  result  of  a  transaction  is  con- 
sistent  with  dw  dtuations  of  the  real  wuld.  whether  or  not  the  dattdiase  is  itaemally  consistent  may  not 
be  important  to  the  triplication.  Depending  on  the  semantics  and  requirements  of  operations,  a  real-time 
system  may  qiply  different  protocols  under  different  situations. 

In  this  section,  we  review  several  tedmiques  for  generating  non-serializable  real-time 
schedules.  All  tedmiques  utilize  some  semantic  ittfomadoa  on  the  temporal  or  depeadtacy  relationship 
between  transactions  or  data  objects.  With  this  extra  infoimation,  the  system  may  produce  a  set  of 
sdiedules  that  are  non-setializabls  but  acceptable  m  the  spedfic  applications. 

4.L  External  and  Temporal  Condstendcs 

Many  RTDBS  are  used  to  monitor  and  contnd  physical  (tevices  and  large  ounplex  systems.  Since 
the  real  world  is  always  dumging,  it  is  tq>  to  the  systems  to  ensure  diat  dieir  database  subsystems  are 
always  consistent  with  die  real  world.  Ideally,  a  system  diould  guarantee  that  the  database  always  con¬ 
tains  good  qifnoximate  values  of  their  real-world  counterparts.  However,  this  may  be  too  expensive  to 
imidemem  for  some  ^iplications  since  there  may  be  too  many  values  to  be  updated  in  the  database. 
Another,  less  mqiensive,  solution  is  to  make  sure  that  all  dam  read  by  a  real-time  transaction  are  dose 
^ipiDximates  of  their  current  real-woild  valws.  The  latter  approach  may  be  easier  to  satisfy  tdnoe  only 
the  subset  of  die  database  currently  used  must  be  guaranteed  to  have  iqHo-date  values. 

Anodier  important  issue  Ibr  real-time  transactions  is  that  of  temporal  consismncy.  Temporal  oon- 
sisttm^r  is  the  constraint  that  an  data  constitute  a  real-world  snapshot  0x.  diey  cotie^pood  to  the  real- 
world  ftcts  of  approximately  die  same  time).  If  a  transaction  uses  some  new  ftct,  mixed  with  old  has, 
die  transaction  may  have  an  erroneous  picture  about  the  real  world  and  thus  make  wrong  dedsioos. 

A  transaction  Tk  is  a  sequence  of  distinct  operations,  .  Each  operation  of  a  transaction 

accesses  only  one  otjeo.  An  object,  however,  may  be  used  in  more  than  one  operation  fai  a  tmsaction. 
A  transaction  can  thus  be  defined  by  a  sequence  of  (ai,oi)  pairs.  AhistoryH  of  a  set  of  transactions  T  is 
thus  a  sequence  of  operations,  (oi/>i),  from  many  transactions.  Two  histories  are  equivalent  if  they  have 
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die  same  eOta  on  ihe  values  of  database  and  transaction  results.  A  serializable  hisiofy  is  a  Iristocy  which 
is  equivaleitt  to  a  aerial  histoiy  (le.  transactions  are  executed  sequentially). 

Xo  reason  dxiut  the  desiratde  histoiy  for  real-time  databases,  we  define  a  timestamp  for  each  of  the 
transaction  operttions.  The  timestamp  of  each  operation  s(ai)  in  a  history  is  defined  to  be  the  clock  time 
when  the  operation  is  perfonned.  The  timestamp  of  the  (diject  version  s(oi)  is  the  time  when  the  version 
is  created. 

Given  a  Mstoty  of  transaction  executions,  the  external  consisteney  requirement  can  be  defined 
die  following  equation  as  in  [LJ91]: 

V/.ls(ai)-s(oi)|Sei 

The  equation  q)ecifies  that,  for  eadi  operation  of  a  real-time  transaction  Tk,  the  datt  used  by  the  opera¬ 
tion  must  be  within  the  valid  lifespan  ei  of  the  data,  ti  is  dependent  on  the  nature  of  the  operation  ai. 
Some  operation  may  require  its  data  to  be  very  consistent  with  die  real  world  and  therefore  have  a  very 
small  ci  value.  Others  may  have  no  concern  for  the  validity  of  their  data  md  thus  have  Inmany 
practical  ipplicaticms  we  may  have  one  single  value  for  all  (^lerations  in  a  transactioa  Inotherwords, 

Vi. 

To  check  for  the  tmporal  eonsixtency  for  a  transactkm,  we  need  to  compare  die  timestamps  Ibr  all 
objects  read  by  die  transaction.  In  odier  words,  transaction  may  require  that  die  timestamps  of  all 
objects  it  reads  have  a  difference  not  larger  than  fit: 

ViJ.  \s(pi)-s(.oi)\Sh 

Sometimes,  only  the  data  in  a  data  set  have  temporal  consistency  requirements.  For  example,  the 
dnee  dimensional  attributes  of  an  tdrcnft  location  must  be  temporally  consistent  iriienever  they  are  used 
in  a  computatioa  Inthatcase,  we  can  define  a  temporal  ccmsistency  requirement  for  a  data  set  S  in  terms 
of^: 

Vi |s(oi)>s(oi)|  yfbm{oi’oi}cS 

The  idea  (rf  temporally  condstent  data  set  can  be  compared  to  the  Ammic  Doca  Sets  (ADS)  proposed 
by  Rajkumar.  In  Us  diesis  [Raj89],  Rtjknmar  proposed  an  q>i»Dach  to  decompose  a  database  into  dis- 
jUnt  ADS,  and  use  the  modular  concurrency  control  protocol  [SUSS]  for  real-time  database  concurrency 
oontrbL  The  consistency  ofeadi  ADS  can  be  mainudned  indqiendent  of  die  other  ADS’s.  Atetwlsetwo 
pAsse  iocAiitg  protacoi  is  then  used  to  make  sore  that  tiansacticms  are  tun  serializably  with  respect  to  each 
of  the  atomic  data  sets.  However,  no  conoq;)t  of  temporal  craisistency  is  defined  in  ADS.  Webelievedut 
die  two  concepts  are  compatible  and  that  some  protocol  can  be  implemented  incoipontiitg  bodi  of  diem. 

An  extensive  set  of  dffloladons  have  been  perfonned  in  [0.92]  to  study  die  peifiMinanoe  of  various 
concurrency  control  algorithms  in  maintaining  die  temporal  consistency  ofdau  in  hard  real-time  systems. 
A  multivetdon  model  Is  assumed  in  the  study  and  the  transactions  are  assumed  to  be  mostly 
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perkxBc.  Tte  imdy  compares  the  two-phase  locking  and  the  optimistic  concumncy  contnri  algoridans, 
and  finds  that  the  optimistic  algorithm  is  poorer  in  maintaining  temporal  consistency. 

4J.  the  CompiUbllity  Table 

Based  on  the  concept  of  intetnal  and  external  consistencies,  we  can  divide  the  actions  in  a  leal-thne 
transaction  iitto  two  pans:  those  actions  in  the  Export  to  enter  external  events  in  the  ({.e.  to 

maintain  external  consistency)  and  those  in  the  Impart  to  maintain  intetnal  consistency.  TypicaDy,  a 
transaction  may  stan  widi  actions  in  die  E-pan  and  conclude  edth  actions  in  the  I-pait 

In  some  teal-time  apidications.  a  transactimi  cannot  afford  to  srait  for  the  dmabase  to  regain  internal 
consistency  if  the  transaction  has  a  stringent  deadline  requiicmenL  In  such  cases,  the  may  allow 

the  iuenial  conastency  to  be  ignored  temporarily  in  order  to  return  a  result  before  die  transaction's  dead¬ 
line.  However,  in  real-time  systems,  such  transactimi  executions  must  still  maintain  the  extmnal  con¬ 
sistency.  One  solution  is  to  guaruitee  only  the  part  of  the  transaction  which  maintains  the  external  oon- 
tistency.  The  test  of  die  transaction  may  be  executed  at  a  lower  priority  after  the  result  is  produced 
before  die  deadline. 

A  similar  situation  occurs  sriien  a  transactimi  B  is  Uodeed  waiting  on  an  active  transaction  A  to 
fiitidL  Altbou^  A  has  no  deadline  constraint,  B  must  be  finished  beftue  deadline.  In  this  case,  B  can 
intenupt  A  as  long  as  die  E-pait  of  A  has  finished.  After  B  is  executed,  A  may  resume  its  I-part  to 
recover  the  internal  consistency.  Thus  for  B's  execution,  internal  consistency  is  not  guaranteed,  but 
external  consistency  is  maintained.  Since  intetnal  consistency  usually  has  no  time  constraint,  they  may 
be  regained  after  the  external  deadline  is  met 

l^th  the  division  of  the  I-patt  and  E-patt  in  eadi  transaction,  a  transaction  compatibility  table 
(TCD  can  be  defined  in  asystem.  During  nm-time,  udien  real-time  transactions  are  sdieduled,  die  taUe 
is  inflected  to  see  if  they  need  to  wtdt  for  transactions  arrived  earlier.  A  transaction  requiting  an  exter¬ 
nally  cmisistent  data  set  most  wait  until  all  updates  by  its  predecessor  transactkms  are  finished.  A  tran- 
sactitm  is  a  predecessor  transaction  of  r  if  it  updates  smnedau  value  that  will  be  used  by  T.  The  prede¬ 
cessor  transactions  for  T  is  denoted  as preCT).  To  dedde  whether  a  transaction  Ti  should  be  executed 
after  72,  die  entry  ofTCT(7i,  Tails  examined.  The  entry  has  fbor  possible  values  as  fidlows: 

(1)  72  e  preCTi).  7i*s  execution  is  dependem  on  Ta's  execution.  So  if  Ta  is  ahead  of  7t  in  the 

scheduler  queue,  Tz  must  always  be  executed  before  7 1. 

Q)  72  e  pre(7i)  but  72  contains  an  I<partwhidi  can  delayed.  7i  may  preenqit  Ta  tfriien  Ta  reaches 
an  externally  consistent  breakpoint  Qje.  external  dam  have  been  entered). 

(?)  Tz  e  pre(Ti)  but  Tz**  I-iiatt  can  be  skipped  if  7i  has  executed.  Only  the  E-patt -of  72  must  be 
firtished  before  7i’s  execution.  The  I-part  of  Ta  will  not  be  executed  at  aU. 

Ta  is  not  in  pre(7i).  It  is  acceptable  to  execute  7|  beftire  Tz  even  if  Ta  is  in  from  of  7]  in  the 
sdieduler  queue. 
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It  should  be  dear  that  TCT  is  not  synunetric.  Using  the  semantic  infonnatkn  in  a  TCT.  a  scheduler 
may  achieve  better  petfonnance  by  leairanging  the  transaction  sdiedule.  Suppose  a  tmsaction  T  is 
ready  to  be  executed,  and  there  are  several  other  transactions  waiting  to  be  executed  before  T.  The 
sdredukr  can  compute  the  expected  completion  time  for  7  if  all  ready  transactions  are  executed  in  the 
cuder  of  their  arrivaL  If7  can  meet  its  deadline  in  this  schedule,  no  adjustment  is  required  nd  7  will  be 
placed  at  the  end  of  the  schedule.  If  the  expected  cmni^etion  time  for  7  is  later  tium  its  deadlme.  the 
sdieduler  may  adjust  its  staitixtg  time  by  using  the  TCT  infonnation. 

One  of  the  issues  in  using  the  TCT  4)proach  is  the  size  of  the  taUe.  However,  since  most  of  the 
transactions  are  probably  incmnpatible,  TCT  is  a  spmc  muiix  in  most  cases.  There  are  many  effidem 
data  strucnires  designed  for  sparse  matrices  so  the  size  of  die  TCT  should  not  be  a  problem.  A  more 
unfamiliar  issue  is  to  divide  eadi  transaction  into  I-part  and  E>patL  This  can  be  solved  by  emptoying 
some  medunical  method  in  the  compiler  to  analyze  the  data  flow  and  separate  I-pan  from  E-part 
automatically.  The  most  tedious  work  involved  in  implementing  TCT  is  in  deciding  the  value  for  eadi 

f 

entry.  This  may  require  careful  analysis  of  the  semantics.  More  researdi  is  still  needed  to  the 
apfnoach  easily  usable. 

4JL  Epsilon  Serializability  and  Similarity 

^mlon  serializaltility  (ESR)  [Pu91]  is  a  generalization  of  serializability  (SR)  that  explidtly  allows 
limited  amount  of  incmsistency  in  transacticm  processing.  ESR  enhances  the  degree  of  concurrency  since 
some  non-SR  executions  are  allowed.  It  allows  users  to  bound  the  amount  of  temporary  inconsistency.  A 
transaction  with  ESR  as  its  correctness  criterion  is  called  an  epsilon-transaction  (ET).  An  ET  is  a  query 
ET  if  it  consists  only  of  reads.  An  ET  contdning  at  least  one  write  is  an  update  ET.  (^lery  ETs  may  see 
an  inconsistent  data  state  produced  by  update  ETs.  An  update  transaction  may  export  some  inoonsistency 
udien  it  updates  a  data  objea  while  query  ETs  are  accessing  the  same  data  (fojecL 

ESR  defines  correctness  for  both  conristent  states  and  inconsistent  states.  In  the  case  of  consistent 
states.  ESR  becmnes  SR.  In  addition,  ESR  associates  an  amount  of  inconsistency  with  eadi  inconsistent 
state,  defined  by  its  derivation  (or  a  distance)  from  a  consistent  state.  To  an  apfflication  designer,  tiiis 
imiflies  tiiat  eadi  query  ET  has  an  bnport-Undt,  which  spedfies  the  maximum  amount  of  inconsistency 
that  can  be  inqroited  by  it  Similaiiy.  an  update  ET  has  an  export-tinUt  that  spedfies  the  maximum 
amount  of  inconsistency  that  can  be  erqxmed  by  it  Hie  database  system  ensures  that  these  limits  are  not 
exceeded  dutirig  die  execution  of  ETs. 

ESR  has  several  important  apiflicatians  in  real-time  database  systems.  A  concrete  example  of  ESR 
rqrplication  is  rqdicatirm  control  in  distributed  real-time  databases.  It  offers  the  possibility  of  maintaining 
mutual  consistency  of  replicated  data  asyndironously.  A  distributed  real-time  database  uddch  supports 
ESR  permits  temporary  and  limited  differences  among  data  object  replicas:  diese  replicas  are  required  to 
converge  to  the  standard  one-copy  serializability  (ISR)  as  soon  as  all  the  update  messages  arrive  and  are 
processed.  A  recent  simulation  study  shows  a  significant  improvement  in  terms  of  system  reqronsiveness 

can  be  achieved  by  using  ESR  in  a  distributed  real-time  database  system  as  measured  by  the  number  of 
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tnnncikns  diat  iMet  their  deadlines  [Son92]. 

Another  imeitsting  application  of  ESR  in  real-time  database  systems  is  to  use  it  as  a  value-based 
coneotness  lequiitment.  As  long  as  the  changes  made  to  the  value  of  a  dau  (Aject  remain  within  a 
qtecited  limit,  the  system  can  allow  on  aibitrary  order  in  accessing  the  dau  object  by  concunoit  transac¬ 
tions.  I^oiample.  if  an  appiicatioo  am  toknte  an  iffljMecisioo  upto  5  meten  in  Hittiyy  |q  floHipgting 
the  po^on  of  a  moving  object,  a  transaction  can  access  the  dau  object  witiiout  consideiing  the  •mvf 
Older  of  other  omflicting  transactions,  if  the  inconsistency  limit  is  ensured  not  to  be  vkdaied. 

A  related  approach  to  execute  transactions  vdthout  serializabOi^  is  to  allow  rimiiar  data  to  be 
tqxiated  or  read  in  any  order.  If  there  are  two  updates  that  record  similar  values  in  a  database,  it  makes  no 
difference  to  die  future  read  (q)erations  which  value  is  recorded  first  This  is  the  concept  of  sfintfiirjorptD- 
posed  in  IKM92].  Since  a  real-time  database  models  an  external  environment  that  dianges  continuously, 
the  value  of  an  <fi)jea  in  dw  database  can  only  be  similar  to  its  physical  oounteipart  ThereftMe,  fior  each 

data  object  we  can  define  a  region  of  values  that  are  similar  to  a  specific  value.  Moreover,  two 

/ 

states  are  ^ilar  if  the  corresponding  values  of  eveiy  data  objea  in  the  two  states  are  similar. 

With  the  above  definition,  two  views  of  a  transaction  are  said  to  be  similar  iff  every  read  evem  in 
both  views  uses  similar  values  with  respect  to  the  transaction.  A  schedule  isjinal-state  similar  to  anotber 
schedule  If 

(1)  diey  are  over  the  same  set  of  transactions, 

(2)  fisr  any  initial  state,  they  transform  similar  initial  database  stales  into  similar  output 

Thus  a  system  can  produce  a  schedule  which  is  similar  to  a  serializaUe  sdiedule  as  long  as  die  two  final 
states  are  similar  and  the  views  of  transactitms  in  die  two  schedules  are  similar.  Sdieduling  algorithms 
using  this  technique  are  being  investigated  in  [KM93]. 

5.  Condusions 

Hw  design  of  RTDBS  is  still  a  wide-open  area.  Meeting  the  requirements  of  RTDBS  will  rnpiire  a 
hdlanced  and  coordinated  effort  between  ctmcurrency  contrd  and  transaction  scheduling.  In  this  piper, 
we  have  reviewed  the  issues  and  studied  several  ^jjnoaches.  Ihe  first  qiproadi  is  to  combine  existing 
concurrency  oontioipiotoccds  widi  real-time  scheduling  algotiduns.  To  meet  mote  deadlines,  protocols 
can  be  modified  to  fiwor  more  urgent  transactions.  The  other  approach  in  designing  RTDBS  is  to  aqfine 
die  non-seriallzable  semantics  in  real-time  applications.  Since  RTDBS  needs  to  mairitain  external  and 
tempmal  consistencies,  and  real-world  data  usually  have  continuously  evolving  values,  we  can  cm{doy 
non-tradidanal  measures  to  meet  the  deadline  of  real-time  transactions. 

RTDBS  of  tomonrow  win  be  large  and  comjdex.  They  wQl  be  distribuied,  operate  in  an  adtpiive 
manner  in  a  U^y  dynamic  environment,  exhibit  intelligent  behavior,  and  may  have  catastrophic  conse¬ 
quences  if  certain  logical  or  timing  constraints  of  transactions  are  not  met  In  diis  pqier  we  tried  to 
answer  questions  raised  by  by  some  of  the  new  characteristics.  Meeting  the  challenges  fiom  all  of  die 
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dianctnistics  woidd  require  more  extensive  and  coordinated  research  efforts  in  many  of  the  topio 

below: 

•  devciopmeitt  of  modeling  techniques  for  distributed  real-time  transactions  and  databases  to  specify 
timing  properties  and  temporal  consistency  in  an  unamttiguous  manner.  Validity  of  external  data  con- 
tistency  and  relationships  between  consistency  amstiaints  and  timing  constraints  need  to  be  easQy  and 
dearly  specified. 

•  devdopment  of  i»iority-based  scheduling  protocols  and  concurrency  ctmtrol  {xotocols  that  can,  in  w 
integrated  and  dynamic  fashion,  manage  transactions  with  precedence,  resources  (including  communi¬ 
cation  resources  and  I/O  devices),  and  timing  ctxistraints.  In  particular,  resource  allocation  policies 
and  distributed  transaction  management  protocols  must  be  integrated. 

•  new  models  and  pimocols  for  database  fault  tolerance  utKler  real-time  constraints.  Since  recovery  by 
“undoing”  tolerations  may  not  be  qiplicable  in  many  circumstances,  a  form  of  forward  recovery  may 

be  necessary.  Moreover,  systems  may  need  to  provide  unintenuptiUe  minimum  services  and  continue 

/ 

to  function  during  recovery. 

•  new  ardtitecture  support  for  fault-tolerance,  for  efficiem  data  management,  and  for  time-oonstiained 
communication,  bnpottam  issues  in  new  architecture  include  interconnectitm  topology,  interprocess 
cmnmunications.  and  support  of  fault-toleram  database  operations.  It  is  essential  to  have  hardware 
support  for  fast  error  detection,  reconfiguration  and  recovery.  In  addition,  new  architectures  may  sup¬ 
port  real-time  scheduling  algoriduns. 
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Divide  &  Conquer  Strategies  and  Underlying  Lossless  Principles 
Harold  Szu,  Edgar  Cohen^  and  John  V^gate 
Naval  Surface  Warfare  Center  Dahlgren  Division, 

Silver  Spring/White  Oak,  MD  20309-5000 
Abstract 

The  mathematical  principle  of  global  optimization  is  formulated  for 
massively  parallel  and  distributed  processors  (MP  &DP)  by  means  of  divide-and- 
conquer  strategies.  The  lossless  principle  which  is  analogous  to  the  incoherent 
phenomenon  in  physics  is  expressed  in  terms  of  a  vector  velocity  V  that  is  derived 

from  the  least  mean  square  (LMS)  kinetic  energy  E  =  I V  1^/2.  We  illustrate  the 
nonlinearity  that  the  sum  of  the  best  Traveling  Salesman  Problem  (TSP)  solutions 
in  subregions  is  not  necessarily  globally  the  best  because  of  the  boundary  resultant 
vectors:  V  =  A  +  B  have  a  cross  interaction  terms  (A,B)  ^  0.  In  fact,  a  nearest 
neighbor  connection  at  the  boimdary  cities  represents  a  longer  overall  distance  than 
the  next  nearest  neighbor  connection  between  boundary  cities.  Then,  a  theorem  of 
orthogonal  division  error  (ODE)  for  lossless  divide-and-conquer  (D  &C)  is  proved, 
and  the  orthogonal  projection  is  constructed  for  solving  TSP  explicitly. 

Keywords:  Nonconvex  Optimization,  Simulated  Annealing,  TSP,  Lossless  Divide- 
and-Conquer,  Boundary  Resultant  Vectors,  Orthogonal  Projections,  Recursive 
Algorithm 

1.  Introduction 

The  hypothesis  that  a  Divide  &  Conquer  (D&C)  Optimization  Strategy  shall 
work  for  massively  parallel  &  distributed  processors  (MP&DP)  depends  critically 
upon  the  existence  of  a  lossless  mathematical  principle  for  an  Instantaneous  & 
Distributed  Criterion  (I&DC)  allowing  all  divisions/processors  to  make  local 
decisions  which  contribute  positively  to  the  global  optimization  procedure.  In  other 
words,  there  should  be  no  requirement  during  execution  for  communication  with 
the  central  processor  which  would  constantly  assess  tradeoffs  among  the  decisions 
made  by  the  local  tmits. 

This  phenomenon  has  been  referred  to  (by  Szu  in  the  1987  Second 
Supercomputing  Conference  at  Boston  [1])  as  the  Reporter  bottleneck,  namely, 
''Who  should  do  what,  where,  when,  why,  and  how — 6  W's  speed  bottleneck."  The 

first  bottleneck  is  about  10^  operations  per  sec  (ops)  due  to  serial  machines. 
(According  to  Einstein,  the  speed  of  light  is  3x10^®  cm/sec,  which,  for  a  30  cm 
processor  length,  can  only  be  repeated  10^  times).  Depending  on  MP  &DP 
paradigms,  the  second  bottleneck  was  estimated  as  10^^  ops  (about  thousand  times 
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faster  than  the  first),  for  most  parallel  digital  computers  are  operated  under  a  lock- 
step  and  clock-cycle  mode.  The  fact  was  benchmarked  as  follows:  For  both 
Transputers  and  Hypercubes,  the  communication  overhead  costs  retarded  the 
speedup  factor  which  was  revealed  itself  when  plotted  against  the  number  of 
processors  [2].  The  tradeoff  between  the  communication  cost  and  the  actual 
execution  time  will  have  a  diminishing  return  as  the  number  of  processors 
inaeases,  see  Fig.  1. 
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FIGURE  i.  Spud  up  for  Inul's  Uypercule  iPSC/1 


However,  before  we  present  our  solution  in  this  case,  we  want  to  motivate 
the  importance  of  global  optimization  constrained  by  a  minimum  communication, 
especidly  in  the  context  when  one  has  the  advantage  of  a  using  a  large  parallel 
computer  architecture.  For  example,  ARPA  has  produced  a  class  of  Touchstone  with 
upto  532  processor  nodes.  It  belongs  to  the  class  of  Multiple  Instruction  Multiple 
Data  (MIMD)  processors  under  a  lock-step  and  clock  cycle  operation.  Other  style 
computers  exist,  e.g.,  the  sixth  gen  artificial  neural  networks  (ANN)  of  ten  billion 
processors  working  asynchronously  without  clock  cycle,  nor  lock  steps.  A  brain-style 

computes  at  the  speed  of  Mega-Cray  10^x1 0^ops=10^^ops  (estimated  for  10^® 

neurons  x  10^  synaptic  memory  capacity  x  10  ops  each).  We  donot  feel  the  need  of  a 
clock  cycle  for  the  lock  step  communication  and  execution,  because,  if  we  hear  tick- 
tock  in  our  brain  as  we  think,  we  got  to  see  a  psychiatry.  Then,  what  is  the 
underlying  mathematics  (of  such  a  massively  parallel  Brain-style  computing)  that 
seems  to  break  the  barrier  of  the  second  bottleneck  by  three  orders  of  magnitude? 

We  believe  that  such  a  principle,  if  exists,  must  help  minimize  the 
communication  need  in  a  so-to-speak  lossless  divide-and-conquer  strategy.  At  this 
point,  such  a  strategy  might  seem  to  be  solely  created  by  the  future  generations  of 
computers,  but  in  fact  it  is  mathematically  profound  and  underlies  almost  all  real 

world  constrained  optimizations - management,  scheduling,  resource  allocation, 

inventory,  logistics,  global  optimization  and  the  military  focus  on  distributed 
warfare,  command  &  communication[31. 


236 


In  this  paper,  a  theorem  of  a  lossless  D&C  strategy  is  given  in  Sect.  2  for  the 
LMS  global  optimization  for  the  first  time,  and  TSP  example  in  Sect.'  3. 


2.  Theorem  of  Lossless  Divideand-Conquer 

The  least  mean  square  (LMS)  kinetic  energy  E  is  defined 

E  s  (1/2)<(V,V)>  s  (1/2)<IVI^  (1) 

where  the  angular  brackets  <  >  denote  statistical  ensemble  average,  the  round 
bracket  ( ,  )  the  inner  product  in  the  Hilbert  space.  Note  that  in  the  continuum 
medium  the  vector  velocity  V  is  defined  to  be  proportional  to  the  negative  of  the 
gradient  descent  force  F  direction  -  grad  E  =  F  =  dV/dt 

While  such  a  vector  V  is  by  Hamiltonian  definition  a  locally  conservative 
quantity,  the  kinetic  energy  is  (intrinsically  global)  a  scalar  quantity.  In  our  new 
methodology  it  is  crucial  to  adopt  vector  V  as  the  linear  distributable  quantity. 

In  order  to  implement  properly  a  divide-and-conquer  scheme,  one  must  be 
able  to  utilize  a  distributed  criterion.  Mathematically  speaking,  this  means  that 
ideally  one  wants  to  decompose  the  LMS  problem  into  two  LMS  problems  S'-j  :h  that 
the  division  is  lossless.  If  this  divide-and-conquer  is  possible  recursively,  there 
must  exist  two  orthogonal  projection  operators  Pand  Q,  such  that 

P  +  Q  =  I,  (2) 

the  identity  operator.  (It  is  not  trivial  and  not  always  possible  to  construct  P  and  Q  as 
they  must  be  self-adjoint  linear  operators.)  By  such  a  decomposition  of  the  problem, 
one  can  thus  eliminate  the  CTOss-correlation  between  the  two  parts  as  follows: 


V 

= 

IV  =  PV  + 

QV  s 

A 

+ 

B,  (3) 

E 

s 

<(A  +  B,  A+B)>/2 

(4) 

Theorem  of  Orthogonal  Division  Errors  (ODE) 

The  divide-and-conquer  in  two  subregions  becomes  lossless, 

<IV|2>  =  <IAl^>+  <IBI^+  2<(A,B)>  (5) 

when  the  the  boundary  resultant  vectors:  A  and  B  of  two  subregions  representing 
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tiie  cross  talk  contribution  become  vanishes. 

Proof; 

Hie  cross  terms  <(A ,  B)>  vanishes  in  two  possible  cases  (1)  itie  deterministic 
orthogonality, 

CA,B)»0,  (6) 

(2)  random  phase  approximation  of  which  die  statistical  av^aged  to  become  zero: 

<(A,B)>  s  0  (7) 

Since 

Optimize  <  1 V =  Optimize  <  I A  !^>  -(Optimize  <  I B  l^>  +  2  Optimize<(A,B)>,  (8) 
dien  by  construction  Eqs(6,7)  we  have 

Optimize  <IVI^>  =  Optimize  <IA l^>  +  Optimize  < IB  1^  (9) 

/ 

Since- energy  by  definition  is  real  and  positive,  the  optima  over  the  subregions 
guarantee  the  optimum  of  the  entire  region.  Furthermore,  the  orthogonal 
decomposition  minimizes  the  communication  needs  during  LMS  executions  in 
each  subregion.  QED 

Furthermore,  the  concept  of  boundary  resultant  vector  permit  an 
algorithmically  recursive  implementation  as  follows.  Given  V=  A+B,  each  A  and  B 
can  be  furthermore  decomposed  into  a+a',  and  bfb',  etc.  In  this  fashion,  one  can  in 
principle  estimate  the  boundary  loss  in  the  divide-and-conquer  strategy  when  the 
decomposition  is  hot  longer  orthogonal. 


3.  Lossless  Divide  and  Con  ^  ler  of  TSP 

A  case  in  point  is  that  of  the  traveling  salesmen  problem  in  which  one  might 
ask:  Is  it  possible  to  divide  the  original  set  of  cities  into  two  parts  such  that  a  solution 
to  each  part  would  be  useful  in  obtaining  a  solution  to  the  entire  problem?  Suppose 
that  our  purpose  is  to  find  an  optimal  tour  once  and  only  once  through  a  set  of 
nodes  (called  cities)  in  any  clockwise  sense  that  the  sum  of  the  squares  of  the 
distances  between  nodes  is  minimal.  In  Hg.  2  we  have  divided  TSP  into  4  quadrants 
and  used  a  modified  Hopfield-Tank  Artificial  Neural  Network  to  find  a  local 
optimum  solution  for  each  quadrant.  Then,  we  tried  to  patch  together  at  the 
boundary,  we  discovered  that  the  cut  and  splice  at  the  nearest  neighbor  cities 
denoted  by  C  was  not  shorter  than  another  cut-and>splice  at  the  next  nearest  cities 
denoted  by  D.  This  is  completely  against  ones  intuition.  It  points  out  the  nonlinear 
nature  that  the  total  is  more  than  the  sum  of  its  parts. 
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Solving  Large-Scale  Optimization  Problems  by 
DIvIde-and-Conquer  Neural  Networks 
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Pythagoren  Theroem 


TT  'n  a  proper  division  of  the  original  set  of  nodes  into  subsets  satisfies  the 
ortnogonality  requirement  Eq(6). 


The  global  mixumum  is  obviously  obtained  in  this  six  dties  along  the  contour 
around  both  regions.  Several  comments  are  given  as  follows. 

(1)  When  ^e  boimdary  resultant  vectors  A  and  B  are  not  only  orthogonal  but 
also  touch  each  other,  by  Pytoagoren  law,  A  and  B  are  losslessly  replaced  by  the  cut- 
and>splice  boimdary  contour  pointing  from  the  head  and  bottom  of  the  arrow  vector 
A  to  toe  head  of  B. 

(2)  When  the  boimdary  displacement  vectors  is  not  orthogonal,  then  toe  cross 
terms  shouldbe  also  minimized.  But  the  communication  cost  becomes  important, 
and  could  be  compounded  in  a  recursive  revision  of  either  A  or  B  in  its  own 
subregion.  In  fact,  the  antiparallelism  (A,B)  ^  0  is  sometimes  preferred  for  toe 
global  minimization. 

(3)  In  other  words,  the  boundary  resultant  vectors  A  and  B  are  not  orthogonal 
division  error  (ODE)  theorem  for  lossless  D  &  C  strategy  is  necessary  but  not 

sufficient  If  min  I A  r  were  not  the  best,  there  is  no  way  to  be  sure  that  the  total 

min  I  A-f  B will  be  the  best,  and  the  theorem  is  only  a  strategy  to  minimize  the 
boundary  correlation  and  therefore  the  communication  cost 

(4)  Once  a  proper  division  is  secured,  toe  next  problem  to  be  addressed  is  toat 
of  finding  optimal  tours  through  toe  two  subsets  (induced  by  toe  projections  P  and 
Q).  This  is  addressed  by  toe  simulated  annealing  algorithm  in  Sect.4. 

4.  Simulated  Annealing 

The  concept  of  simulated  annealing  stems  from  toe  pioneering  work  of 
Stuart  and  Donald  Geman[5]  in  1984,  and  to  work  of  Kirkpatrick,  CSelatt,  and  Vecchi 
[6],  published  in  1983.  The  seeds  of  toe  endeavors  of  these  scientists  were  in  turn 
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due  to  the  pioneering  effort  of  Nicholas  Metropolis/  Arianna  Rosenbluth,  Marshall 
Rosenbluth/  Augusta  Teller,  and  Edward  Teller[7]  in  1953.  Thus,  the  fundamental 
methodology  is  at  this  point  in  time,  forty  years  old,  but  the  details  of  its  systematic 
theory  have  been  addressed  only  relatively  recently.  The  process  is  called  simulated 
annealing  because  its  purpose  is  to  emulate  a  well-known  phenomenon 
encoimtered  in  condensed  matter  physics  based  on  statistical  mechanics  principles. 
That  spedfic  purpose  is  the  discovery  of  ground  states  of  systems  compost  of  large 

numbers  of  atoms  (typically  of  the  order  of  10^  per  cubic  centimeter). 

Simulated  annealing  as  a  mathematical  concept  is  of  interest  in  that  it 
represents  a  systematic  approach  to  the  solution  of  a  large  class  of  nonconvex 
optimization  problems.  Whereas  most  optimization  techniques  for  deterministic 
problems  utilize  some  iterative,  deterministic  mechanism  or  strategy,  simulated 
annealing  employs  a  more  global  concept  which  is  closely  tied  to  probability  theory. 
Fundam^tal  to  this  approach  are  two  notions:  state  generation  and  state  acceptance. 
Note  that  classical  methods  invoke  only  the  first  part,  namely,  state  generation  and 
that,  therefore,  in  the  case  of  nonconvex  problems,  they  often  yield  local  rather 
than  global  optima.  The  key  to  the  success  of  simulated  annealing  as  an 
optimization  principle  lies  in  the  state  acceptance  and,  in  particular,  to  the  proper 
coupling  of  generation  and  acceptance.  Once  that  is  accomplished,  one  obtains  a 
theory  of  convergence  for  global  optima,  and  the  task  is  therefore  that  of  enhancing 
the  rate  of  convergence  by  appropriately  combining  certain  generation  and 
acceptance  probability  distributions.  A  rigorous  convergence  theory  on 
multidimensional  lattices  has  been  developed  as  is  evidenced  by  two  papers  which 
appeared  in  the  Journal  of  Applied  Probability  [8]  [9].  It  is  possible  to  utilize  this 
methodology  on  continuous  variable  problems  simply  by  dividing  the  underlying 
space  into  regions  of  small  size  and  then  construing  these  subregions  as  entities  to 
which  the  generation  and  acceptance  laws  apply. 

As  originally  contemplated,  simulated  annealing  as  a  mathematical  concept 
was  devised  as  a  sequential  algorithm  useful  for  optimizing  some  energy  or  cost 
functions  on  a  multidimensional  lattice.  As  it  was  observed  that  the  method  was 
"slow  to  converge"  recent  work  has  been  concentrated  upon  expediting  the  process 
by  utilizing  different  distributions.  For  example,  thermodynamics  dictates  use  of  a 
Boltzmann  law  for  both  state  generation  and  acceptance,  and  that  works  properly  in 
conjunction  with  a  temperature  schedule  which  is  inversely  proportional  to  the 
logarithm  of  time.  To  enhance  the  rate  of  convergence,  Szu[12]  has  recommended 
using  a  Cauchy  distribution  for  state  generation  together  with  a  temperature 
schedule  inversely  proportional  to  time,  since  the  variance  of  a  Cauchy  random 
variable  is  infinite  [13].  Thus,  intuition  suggests  that  various  parts  of  the  landscape 
would  be  visited  more  rapidly,  since  a  Cauchy  law  permits  "random  (Le'vy)  flights 
[14]",  in  addition  to  (Wiener)  random  walks.  It  turns  out  that  one  must  then  be 
careful  to  use  an  acceptance  law  compatible  with  the  Cauchy  generation  law.  A 


recent  discovery  is  that  a  Boltzmann  law  is  not  the  proper  one  to  choose,  as  then  the 
overall  Markov  process  ceases  to  be  ergodic.  One  nee^  to  use  either  an  acceptance 
law  which  is  temperature  independent  or  one  based  perhaps  on  a  modiHcation  of 
the  Cauchy  distribution  [15]. 


•  Simulated  Annealing  •  Fast  Simulated  Annealing 


With  the  advent  of  modern  digital  computers  and,  in  particular,  with  the 
production  of  massively  parallel  and  distributed  computers  (MP  &  DP),  there  has 
been  a  shift  in  emphasis  toward  that  of  designing  optimization  algorithms  which 
can  take  advantage  of  this  capability.  In  particular,  one  would  like  to  capitalize  upon 
the  use  of  such  a  parallel  architecture  by  dividing  the  workload  among  the  different 
processors.  In  this  way  perhaps  the  speed  of  the  optimization  technique  would  not 
be  so  critical,  since  the  method  would  be  employed  by  any  given  processor  on  only  a 
small  part  of  the  imderlying  space. 

There  are  indeed  two  t)q>es  of  optimization  problems  to  be  addressed  by  a 
multiprocessor  architecture:  (1)  those  which,  relative  to  the  optimization  criterion, 
can  be  naturally  divided  into  subproblems  whose  optimal  states  are  directly  related 
in  some  manner  to  the  global  optimal  state.  In  other  words,  the  global  optimum  is 
some  function  of  the  optima  for  the  subproblems  and  (2)  those  for  which  a  global 
optimum  is  not  achievable  in  this  way,  but,  nevertheless,  there  may  be  some 
function  of  the  subproblem  optima  which  yields  a  satisfactory,  though  not  strictly 
global,  optimal  state  (  a  so-called  suboptimal  state).  Under  ^e  second  possibility, 
another  issue  arises,  namely,  that,  by  changing  the  objective  function  itself,  one 
might  be  able  to  convert  a  problem  of  type  (2)  into  one  of  type  (1).  A  primary  focus  of 
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our  research  is  to  discover  the  proper  objective  functions,  if  they  exist,  which  are 
naturally  related  to  the  globally  optimal  states  in  the  sense  that  decisions  made 
locally  by  the  individual  elements  of  a  massively  parallel  architecture  are  sufficient 
to  obtain  the  global  optimum,  thus  obviating  the  need  for  communication  with  any 
central  processing  facility.  The  use  of  force  rather  than  energy  as  an  objective 
function  could  be  useful  here,  since  force  is  fundamentally  a  local  concept. 

There  has  been  a  fair  amount  of  work  dating  from  the  1970's  devoted  to  the 
partitioning  of  directed  graphs.  Such  work  may  be  directly  relevant  to  parallel 
simulated  annealing.  For  example,  it  should  be  possible  to  utilize  some  of  the 
heuristic  procediires  given  in  this  literature,  together  with  simulated  annealing,  to 
devise  "good  solutions"  to  optimization  problems.  Thus,  efficient  hybrid 
approaches  would  be  developed  which  would  expedite  the  search  for  satisfactory 
solutions  to  NP-hard  problems. 

Let  us  illustrate  the  hybrid  approach  by  appealing  to  concepts  extracted  from 
papers  of  the  type  just  mentioned.  The  first  paper  to  be  mentioned  is  one  from 
Kemighan  and  Lin  which  actually  appeared  in  1970  [10].  The  main  problem 
addressed  in  this  paper  was  the  following:  Partition  the  nodes  of  an  undirected 
graph  with  costs  on  its  edges  into  subsets  of  given  sizes  so  as  to  minimize  the  sum  of 
the  costs  on  all  edges  cut.  Two  applications  of  the  methodology  developed  in  this 
paper  are:  (1)  Planning  of  circuit  boards  and  (2)  Computer  paging  properties.  We 
proceed  to  some  implementation  details.  Let  G  be  a  graph  of  n  nodes  of  sizes 
(weights)  wj,  1^  i  ^  n,  and  p  a  positive  number  such  that  0  <Wj  ^  p  for  all  i.  Suppose 

that  C  =(cj  j),  l^,j^  n,  is  the  cost  matrix  (weighted  connectivity  matrix).  Now  let  us 
define  a  k-way  partition  of  G.  One  has  k  subsets  V|,  lli  of  vertices  of  G  such  that 


V|  OVj  =  0,  va],  and  such  that  U  Vj  =  G.  Furthermore,  defining  I  Vj  I  to  be  the 
niimber  of  vertices  in  Vi,  an  admissible  partition  is  one  for  which  I  Vj  I  <p  for  all  i. 
The  cost  of  a  partition  is  just  S  over  all  tmordered  pairs  (i,j)  with  i  e  Vj ,  j  e  Vj,  and 

Vjj  OVj  =0.  The  authors  note  that  minimizing  external  cost  is  equivalent  to 

maximizing  internal  cost.  Furthermore,  supposing  that  kp  =n  and  that  one  has  the 
task  of  partition  G  into  k  subsets  of  size  p,  one  find  that  the  total  number  of  classes  to 
be  considered  is  (supersaipt  T  denotes  the  trajispose): 

(l/k!)(n  p)^(n-p  p)^...(2p  p)^ 

For  n=40  and  p  =10  (k=4),  the  result  exceeds  10^®  cases  !  Qearly,  one  should,  in  this 
framework,  generally  contemplate  heuristic  solutions.  To  do  so,  Kemighan  and  Lin 
first  consider  two-way  uniform  partitions,  wherein  the  problem  is  to  find  a 
minimal-cost  partition  of  a  given  graph  of  2n  vertices  into  two  sets  of  n  vertices 
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eadu  In  madiematical  tenns  this  may  be  phrased  as  follows: 

Let  S  be  a  set  of  2n  points,  C  -(cj j),  2n.  Assume  that  C  i$  s)munetric  and 

that  C||sO  for  L  The  quantity  c^j  is  unrestricted  in  sign.  One  wants  to  minimize  the 

external  cost  T=  aUb  *5,  AO  B=0,  I A I » I B I  =  n.  The  essence  of  the 

mediod  is  to  start  with  an  arbitrary  partition  (A,B)  of  the  set  of  nodes  £  j  and  to  try 

to  decrease  the  initial  external  cost  T  by  a  series  of  interchanges  of  subsets  of  A  &  B. 
Kemighan  and  Lin  note  that  a  minimum  cost  2-way  partition  is  derivable  from 
(A,B)  by  extracting  a  certain  subset  X  from  A  and  a  certain  subset  Y  from  B  and  then 
interchanging  the  two  to^produce  an  optimal  partition  {A*,  B*).  Diagrammatically 


Therefore,  A*  »  (A'~  X)Uy,  B*  =  (B~Y)  Ux,  where  I X I  =  I Y I  ^  n/2.  They  propose 
an  optimization  algorithm  for  accomplishing  this  in  a  systematic  manner  without 
having  to  consider  all  possible  pairs  (X,Y)  directly.  This  whole  process  (a  kind  of 
divide-and-conquer  approach)  may  clearly  be  repeated  in  order  to  secure  a  k-way 
partition  of  the  original  graph  G.  However,  there  is  an  attractive  hybrid  approach 
which  would  utilize  simulated  annealing.  As  noted  before,  minimizing  external 
cost  for  any  k-way  partition  is  equivalent  to  maximizing  the  total  internal  cost  of  the 
k  subgraphs.  This  remark  is  valid,  because  the  total  cost  for  the  original  graph  is 
constant.  Therefore,  one  may  employ  the  heuristic  mechanism  of  Kernighan  and 
Lin  for  i  steps  of  their  procedure,  afterwards  appealing  to  simulated  annealing  to 
complete  the  whole  process.  The  philosophy  here  is  that,  once  the  number  of 
nodes  has  been  satisfactorily  reduced,  simulated  annealing  may  be  a  useful  tool 
with  regard  to  both  its  accuracy  and  speed,  especially  when  fast  simulated  annealing 
is  employed.  Another  viewpoint  may  also  be  adduced  at  this  point  when  problems 
such  as  TSP  (the  traveling  salesman  problem)  are  considered.  It  seems  reasonable 
that  good,  though  clearly  not  optimal,  solutions  should  be  obtained  through  use  of 
k-way  minimal  cut  partitions,  since  in  TSP  one  seeks  a  minimal  cost  tour. 
Ther^ore  once  such  a  k-way  partition  is  obtained  via  the  K-L  procedure,  it  would  be 
reasonable  to  invoke  "parallel  simulated  annealing"  on  the  k  subgraphs  and  thus 
fuse  the  results  to  secure  a  solution  to  TSP. 

With  regard  to  the  TSP  in  particular,  we  would  finally  like  to  mention  some 
results  of  an  old  paper  by  Richard  Karp  [11],  in  which  a  probabilistic  analysis  of 
partitioning  algorithim  for  TSP  was  conducted.  Using  a  simple  partition  scheme  to 
divide  a  rectangle  enclosing  the  original  cities  into  2k  subrectangles  each  containing 
at  most  t  cities,  where  t  was  given  a  priori,  he  showed  the  following: 
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There  exists  a  family  of  algorithms  with  Ute  property  that ,  for  every  e  >0, 
there  is  an  algorithm  A(e)  in  the  family  such  ^t  (a)  A(e)  rims  in  time  C(e)  n  +  0(n 
log  n);  (b)  with  probability  1,  A(e)  produces  a  tour  costing  not  more  than  (1+e)  times 
the  cost  of  an  optimal  tour.  Consistent  with  the  philosophy  presented  here,  his  idea 
was  to  partition  the  original  region  X  (a  rectangular  region  enclosing  all  dties)  into 
"small"  subregions,  each  of  which  contcu  's  about  t  cities.  Then  an  optimal  tour  was 
to  be  constructed  within  each  subregion  (using  a  computer  program  TOUR),  and  the 
subtours  were  to  be  combined  to  yield  a  tour  through  all  ^e  cities.  Again  one  can 
envision  the  possibility  of  using  parallel  simulated  annealing  in  conjunction  with 
the  Karp  algorithms. 

A  simple  heuristic  rule  of  Lin  is  based  on  the  triangle  inequality  of 
Pydiagoren  law  that  A  +  B  S  C.  Wherever  there  is  a  cross  of  a  tour  path,  one  applies 
the  inequality  to  uncross  the  path.  This  is  illustrated  with  100  cities  over  a  unit 
square  with  a  random  tour  path  giving  a  total  distance  of  about  50,  and  reduced  to 
11.1  when  some  crosses  are  untangled,  when  all  are  eliminated,  one  obtained  8.89. 
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5.  Conclusion 

As  always  it  seems  that  all  truth  is  simple  in  historical  hind  sight  There  is 
no  exception  to  such  a  lossless  Divide-and-conquer  strtategy.  Nevertheless^  we 
have  never  come  aaoss  before.  It  is  possible  for  us  to  discover  the  ODE  because  of 
two  basic  realizations:  (1)  forces,  such  as  resultant  dispalcement  vectors  V  s  A  -f  B  in 
TSI^  are  local  quantities;  but  the  energy,  such  as  the  squared  distance  kinetic  enegy 

Es(l/2)l  V 1^,  is  global  and  scaler;  and  (2)  the  local  vector  quantity  is  mudi  easilier  to 
be  distributed  with  much  less  communication  costs.  We  shall  make  several 
intersting  philosophical  comments  before  our  closing  remarks. 

(1)  Phase  transition:  This  Divide-and-Conquer  strategy  when  is  pushed 

to  the  extreme  to  the  microscopic  world  is  not  imlike  the  physical  annealing 
phenomena  in  a  phase  transition.  As  a  working  simulated  annealing  model  of 
molecular  computing  for  the  global  minimum  crystaline  ice  state,  we  must  adopt 
the  force  (Vander  Waal's  Coulomb  force)  that  requires  minimum  or  no 
communication  need  from  the  central  processor  (Mother  Nature)  giving  a  local 
acceptance  criteria,- — "against  peer  pressure  (force)",  rather  than  climbing  energy 
landscape  (energy),  at  a  higher  temperature  than  the  transition  temperatvire,  in 
order  to  make  distributed  decisions  to  avoid  a  local  minimum.  This  fact  of  local 
force  rather  than  global  energy  eventually  endow  the  system  an  ability  of  finding  the 
global  energy  state  through  a  self-organized  criticality. 

(2)  Incoherent  Sum:  The  lossless  principle  is  consistent  with  our 

common  sense  that  IRS  is  to  make  US  rich  in  its  global  optimization  fashion  while 
each  individual  citizen  wishes  to  be  locally  optimized  to  be  rich  as  well.  The 
theroem  says  that's  possible  only  when  no  conflict  exists,  and  hence  minimum 
communication  is  needed.  Consequently,  for  a  mutual  dependent  sociology,  such  a 
complete  alignment  of  national  and  individual  interests  is  imlikely,  as  would  be 
predicted  by  the  difficulty  to  fulfil  the  condition  of  lossless  D  &  C  stategy:  V  =  Vj 
which  guarantees  individual  optimization  giving  the  global  optimisation. 

Optimize  <  I V  l^>  =  Zj  Optimize  <  I V- 1^> , 

because  each  term  is  real  and  positive,  (incoherent  intensity  sum  for  thoudsand 
points  of  light),  when  the  lossless  orthogonal  division  error  (ODE)  is  satisfied. 

(3)  NP-Complete:  To  elaborate  further  the  degree  of  difficulty  of  finding  a 
general  solution  of  lossless  D  &  C  strategy,  we  mention  that  the  whole  class  of 
computationally  intractable  problems,  e.g.,  the  NP-complete  problems 
(nondeterministic  polynomial  time)  such  as  the  Traveling  Salesman  Problem(TSP) 
or  NP-hard  problems  such  as  the  four-color  mapping  problem,  would  be 
theoretically  solved  if  one  were  successful  in  finding  a  deterministic  procedure  for  a 
lossless  D&C  optimization  strategy.  To  emphasize  the  mathematical  significance  of 
fliis  lossless  D5^  ODE  strategy,  we  wish  to  address  the  general  challenge  as  the  11th 
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problem  of  Hilbert  beyond  the  celebrated  ten  problems,  or  the  von  Neumann 
second  bottleneck  problem,  or  the  Fermat's  last  theorem  (as  gave  to  Mersenne  in 

1643  that  no  integers--x,y,z  exist,  such  that  x^+)f^  =  satisfies  Pythagoren  law  and  z 
is  a  square  numb^  and  (x+y)  is  too),  of  which  the  boundary  resultant  vectors  A  and 
B  is  perhaps  a  special  case.  We  believe,  whatever  it  is  referred  to,  it  remains  as  one 
of  the  central  challenges  of  20th  Century  computer-civilization,  and  in  the  21  st 
Century  the  computerization. 

In  summary,  we  have  found  a  mathematical  principle  for  a  lossless  Divide- 
and-Conquer  strategy  by  minimizing  the  commimication  need.  Also,  in  this  paper, 
we  have  found  a  nontrivial  practical  example  TSP  to  illustrate  the  losses  principle. 
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ABSTRACT 

This  paper  describes  the  role  of  a  simulation  testbed  in  analyzing  the  non-steady  state  behavior  of  a  fault 
tolerance  protocol  underlying  the  design  of  the  Federal  Aviation  Administradon's  (FAA)  next  generation 
air  trafnc  control  system.  The  model^  protocol,  the  group  membership  protocol,  is  designed  to  ensure 
the  consistency  of  state  information  among  processors  that  support  fault  recovery  through  hardware  and 
software  redundancy.  The  modeling  objective  was  to  test  the  robustness  of  the  protocol  in  the  presence 
of  faults  and  forward  detected  problems  to  the  developers  for  resolution;  modeled  fault  scenarios  focused 
on  performance  faults  attributable  to  late  or  lost  messages  and  timers  that  did  not  expire  on  time  or  at  all. 
The  model  was  implemented  using  GPSS/vi™  and  successfully  identified  fault  scenarios  in  which 
protocol  behavior  wasxleemed  unacceptable,  thereby  resulting  in  modifications  to  the  protocol. 

Keywords:  Fault  Tolerance,  Group  Membership  Protocol,  Fault  Injection,  Simulation 

1.  Introduction 

This  paper  describes  the  combined  use  of  discrete-event  simulation  and  fault  tree  analysis  in  evaluating  a 
distributed  protocol,  the  group  membership  (GM)  protocol,  when  injected  with  simulated  faults.  The 
GM  algorithm  is  an  essential  element  of  the  Federal  Aviation  Administration's  (FAA)  Advanced 
Automation  System  (AAS)  program  for  modernizing  the  air  traffic  control  (ATC)  system.  A  key 
requirement  of  the  AAS  is  high  availability.  In  AAS,  downtime  (the  time  critical  services  are  not 
available)  is  limited  to  only  a  few  seconds  per  year.  If  the  GM  protocol  fails  to  function  correctly  and 
consistently,  then  the  availability  requirements  for  the  AAS  can  not  be  met 

As  an  essential  building  block  of  the  AAS  fault  tolerance  scheme,  the  GM  algorithm  is  designed  to 
ensure  that  each  member  of  a  group  of  processors  has  the  same  "view"  of  the  state  of  all  the  processors 
in  the  group  (i.e.,  working  or  failed).  For  example,  if  a  hardware  or  software  element  (server)  providing 
a  specific  service  fails,  then  that  event  will  be  detected  by  all  processors  so  that  appropriate  recovery 
actions  can  be  initiated.  The  state  consistency  problem  is  a  classical  problem  in  distributed  computing 
whose  implementation  in  AAS  must  satisfy  stringent  real  time  needs. 

While  it  is  possible  to  analyze  the  behavior  of  fault  tolerance  protocols  using  timelines  for  simple 
applications,  this  approach  b^omes  infeasible  for  more  complex  systems  containing  multiple  processors 
and  large  numbers  of  concurrent  events.  Another  approach  one  could  pursue  is  extensive  laboratory 
testing.  Although  limited  testing  in  the  laboratory  has  been  conducted  for  the  GM  protocol,  laboratory 
time  is  scarce  and  it  is  often  diflticult  and  time  consuming  to  instrument  failure  scenarios,  for  example, 
those  requiring  multiple  and  nearly  simultaneous  faults.  Consequently,  a  well  designed  and  accurate 
model  of  the  protocol  offers  the  advantages  of  assessing  many  fault  scenarios  with  minimal  use  of 
laboratory  resources.  Regardless,  the  objective  of  our  analysis  effort  was  to  identify  fault  scenarios 
which  resulted  in  GM  protocol  behavior  that  violated  the  software  developer's  design  guidelines. 

The  remainder  of  this  paper  is  organized  as  follows.  Section  2  outlines  the  operation  of  the  GM  protocol. 
Our  GPSSAd  [1,2]  simulation  model  of  the  protocol  is  described  in  Section  3.  Section  4  surtmiarizes  an 
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example  of  a  protocol  deficiency  that  was  identified  using  the  simulation.  Finally,  our  conclusions 
appear  in  Sections  5. 

2.  The  Group  Membership  Protocol 

We  begin  by  providing  some  background  information  for  the  GM  protocol  and  then  describe  its 
operation  in  detail. 

2.1  Background 

A  fundamental  concept  for  ensuring  high  availability  of  a  computing  service  is  the  replication  of  state 
information  on  separate  processors  [3].  Upon  recogninng  the  failure  of  a  peer  server  on  one  processor, 
the  surviving  servers  on  the  remaining  processors  have  enough  state  information  to  resume  the  work  of 
their  failed  peer.  In  order  for  this  scheme  to  work  in  practice,  the  replicated  servers  must  achieve 
agreement  on  the  global  state  in  the  presence  of  random  communications  delays,  component  (hardware 
and  software)  faults,  and  server  joins  (i.e.,  the  addition  of  new  servers).  The  purpose  of  the  GM 
algorithm  is  to  achieve  this  agreement.  The  theoretical  work  underlying  the  development  of  the  GM 
protocol  was  performed  by  Flaviu  Chrisdan  while  at  IBM's  Almaden  Research  Laboratory  in  San  Jose, 
Qdifomia  [4]. 

In  the  AAS  design,  sets  (typically,  2*4)  of  homogeneous  processors  (designated  groups)  provide  the 
hardware  redundancy  necessary  for  high  availability  air  traffic  applications  (Figure  1). 
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Figure  1:  AAS  Hardware  Redundancy  Groups 

Iransmission  of  information  among  processors  is  accomplished  through  a  common  transmission 
medium.  Both  clients  which  request  service  and  servers  that  provide  service  are  replicated  in  designated 
partitions  of  processor  memory  called  address  spaces.  The  primary  copy  of  client  or  server  software 
resides  in  the  primary  address  space  ^AS)  on  one  processor  while  the  standby  or  backup  copy  (SAS) 
resides  on  a  separate  processor.  In  Figure  1,  the  group  consists  of  three  processors  A,  B,  and  C  with 
client  PASs  residing  on  A  and  respectively.  'The  corresponding  SASs  are  replicated  on  B  and  A. 
Oients  communicate  with  servers  via  service  requests  and  responses  are  returned  in  service  request 
responses.  The  gray  colored  arrows  from  PAS  to  SAS  denote  the  periodic  flow  of  status  information 


^  The  PASs  on  A  and  C  provide  different  services. 
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required  to  ensure  that  the  SAS  is  sufficiently  current  to  assume  the  primary  processing  responsibilities 
in  the  event  of  PAS  failure.  If  the  client  PAS  on  A  or  A  itself  fails,  for  example,  then  the  PAS 
processing  will  "switchover"  to  the  client  SAS  on  B. 

% 

The  GM  protocol  executes  concurrently  in  each  group  member.  It  is  designed  to  detect  changes  in  group 
composition,  propagate  those  changes,  and  ensure  that  the  resulting  ui^ates  are  consistent  among  all 
members  even  in  the  presence  of  faults.  Since  servers  are  supported  by  hardware  resources,  processor 
failures  imply  server  loss  but  the  converse  is  generally  not  true.  In  terms  of  this  distinction,  the  GM 
protocol  is  concerned  with  the  availability  of  the  processors  that  support  those  servers.  Each  group 
member  maintains  a  set  of  state  information  termed  the  membership  view  which  contains  the 
identification  (ID)  number  of  each  processor  it  believes  is  in  the  group  and  the  time  that  its  membership 
view  last  changed.  Generally,  the  membership  view  changes  whenever  processors  join  or  depart  the 
group.  A  processor  may  join  a  group  during  initial  group  formation  or  after  being  repaired:  a  processor 
departs  if  commanded  or  if  it  has  failed. 

Regardless  of  the  cause,  membership  changes  detected  by  one  member  are  propagated  on  the  network  to 
the  remaining  group  members  via  group  commands  and  incorporated  into  the  views  of  all  members  at  an 
agreed  upon  future  time.  The  effect  of  processing  a  group  command  is  that  all  group  members  update 
their  views  at  (nearly)  the  same  time^  and  function  as  if  they  were  a  single  logical  processor. 

2.2  Protocol  Description 

The  GM  algorithm  consists  of  two  processes.  The  first  process  (Figure  2a),  termed  Steady  State,  detects 
the  failure  of  existing  group  members.  Steady  State  uses  periodic  timers,  called  Roll  Call  (RC)  and 
Validation  (VAL),  which  are  synchronized^  among  group  processors.  When  the  RC  timer  expires,  the 
VAL  timer  is  started  and  each  group  member  sends  an  Accept  Roll  Call  (ARC)  message  to  the  other 
members  of  the  group.  The  ARC  message  contains  the  membership  view  of  its  author  and  signifies  diat 
the  sender  is  still  working.  Upon  receipt  of  an  ARC  message,  the  recipient  notes  that  the  ARC  author 
has  reponed.  When  the  VAL  timer  expires,  each  processor  evaluates  its  membership  and  marks  as  failed 
those  group  members  which  have  not  reported.  TTie  RC  timer  is  then  restarted  and  Ae  cycle  continues. 

The  absence  of  an  ARC  is  the  mechanism  by  which  one  processor  infers  the  failure  of  another. 
Assuming  a  group  of  four  processors  (A,  B,  C,  D),  the  first  validation  time  (1.0)  in  Figure  2a  shows  the 
case  where  all  processors  (only  A's  membership  view  shown)  have  received  ARC  messages  from  the 
other  three  and,  hence,  have  the  same  membership  view.  Each  group  member  now  waits  for  the  next 
expiration  of  its  RC  timer.  The  second  validation  event  shows  the  case  where  processors  A,  C,  and  D 
(only  A  shown)  have  received  all  expected  ARC^,  but  processor  B  has  not  received  an  ARC  from 
processor  C  (for  example,  due  to  a  performance  fault)  before  validation  time;  as  a  result,  B  deletes  C 
from  its  view  (now  ABD)  and  broadcasts  a  Process  Membership  Check  (PMC)  message. 

The  PMC  message  is  broadcast  to  all  group  members  whenever  an  existing  group  member  updates  its 
view  and  indicates  that  there  is  a  discrepancy  among  membership  views  that  must  be  resolved.  The  PMC 
message  contains  the  membership  view  (membership  list  plus  last  change  time)  of  its  author.  The 
author’s  view  is  then  compared  to  the  receiver’s  view;  if  the  views  are  the  same  no  action  is  taken.  If 
they  differ,  we  must  decide  which  view  is  better. 


^  Note  that  the  "atomic  effect"  of  group  commands  assumes  that  the  clocks  of  all  group  members  are 
synchronized  to  a  specified  tolerance. 

2  A  separate  timer  synchronization  protocol  is  used  to  ensure  that  the  clocks  for  each  processor  in  the 
group  are  in  approximate  agreement. 
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View  X  is  defined  as  better  than  view  Y  if  one  of  two  conditions  holds:  1)  X's  membership  change  time 
is  more  recent  than  Y's  or  2)  if  their  change  times  are  identical,  X's  membership  list  is  "greater"  than  Y’s 
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when  the  lists  of  processor  IDs  are  compared  position  by  position.  Zero  (indicating  absence  of  a 
processor)  is  considered  to  be  greater  than  any  non-blank  entry.  For  example,  let  processors  A,  B,  C, 
and  D  have  IDs  1,  2,  3,  and  4,  respectively.  View  234  is  better  than  134  since  2  >  1  and  1230  is  better 
than  1234  since  0  >  4. 

To  attain  view  consistency,  worse  views  "conform"  to  better  views  as  follows.  Given  that  X’s  view  is 
better  than  Y's,  first  determine  if  X's  view  is  a  proper  subset  of  Y's.  If  yes.  then  the  processor  with 
view  Y  marks  as  failed  those  processors  with  IDs  belonging  to  the  view  differenced  In  Figure  2a,  B's 
view  (ABD)  is  a  subset  of  A’s  view  (ABCD)  and,  hence,  processor  A  deletes  C  (view  difference  of 
ABCD  and  ABD).  Like  A,  processor  D  removes  C  from  its  view.  Note  that  to  conform  to  the  received 
better  view  (ABD),  C  must  shut  itself  down.  The  result  now  is  that  A,  B,  and  D  have  updated  and 
consistent  views  (viz.,ABD).  If  X  is  not  a  subset  of  Y,  then  the  processor  with  the  worse  view  must  fail 
itself.  It  marks  itself  as  failed  and  broadcasts  an  Accept  Member  Depart  (AMD)  group  message.  All 
processors  receiving  the  AMD  message  update  their  views  at  the  same  time  to  reflect  the  failure  of  the 
AMD's  author. 

The  second  process  comprising  the  GM  algorithm  allows  processors  to  join  a  group.  There  are  many 
special  cases.  In  this  paper,  we  consider  the  simple  case  of  a  processor  attempting  to  join  an  existing 
^oup.  The  basic  notion  is  that  before  a  processor  can  join  a  group,  it  must  synchronize  its  RC  artd  VAL 
timers  with  those  of  the  existing  group  members  and  receive  a  copy  of  the  membership  views  from  the 
group  members. 

For  the'  sake  of  simplicity,  assume  that  the  existing  group  consists  of  a  single  member  A  which 
Processor  B  wishes  to  join.  As  illustrated  in  Figure  2b,  when  Processor  B  receives  an  ARC  from  A  it 
uses  the  message's  time  stamp  to  synchronize  its  own  RC  and  VAL  timers  and  broadcasts  an  Accept  Join 
Request  (AJR)  message  to  the  group.  Upon  receiving  the  AJR,  both  A  and  B  add  B  to  their  join  list.  At 
the  next  RC  time,  B  sets  its  Accept  Group  Configuration  Database  (AGCD)  timer  indicating  that  it 
expects  to  receive  a  copy  of  the  membership  view  from  A  while  A  transmits  an  AGCD  message  to  B  (A 
knows  the  identity  of  B  from  the  AJR  message  that  it  received.).  The  AGCD  message  contains  the 
current  membership  view  from  the  perspective  of  its  author.  B  cancels  its  AGCD  timer  upon  reception  of 
the  AGCD  message  and  updates  its  membership  view  (previously  empty)  lo  include  A.  At  this  point, 
both  A  and  B  have  the  same  view  (A).  B  is  then  added  to  the  membership  views  of  both  A  and  B  at 
validation  time  yielding  a  common  view  of  AB.  A  PMC  message  is  then  broadcast  by  A. 

Depending  on  the  circumstances,  several  rounds  of  PMC  commands  may  be  necessary  to  establish 
consistent  views  among  group  members.  Since  inconsistencies  usually  result  in  removal  of  processors 
and  the  group  has  finite  size,  group  equilibrium  is  achieved  quickly.  Note  that  the  primary  role  of  timers 
within  the  protocol  is  passive;  if  problems  are  not  detected  before  their  expiration,  they  serve  as  a  "last 
line  of  defense".  From  a  real-time  perspective,  the  values  of  the  RC  and  VAL  timers  are  envisioned  to  be 
on  the  order  of  two  and  one  seconds,  respectively. 

3.  Analyzing  the  GM  Protocol  Using  GPSS/vi 

The  approach  for  analyzing  the  GM  algorithm  was  straightforward.  The  first  step  in  the  analysis  was  to 
functionally  simulate  the  algorithm  using  the  interactive  simulation  environment  of  the  GPSSM  modeling 
system.  The  next  step  was  to  develop  a  systematic  fault  tree.  Initially,  our  analysis  focused  on 
performance  faults  affecting  the  messages  and  timers  underlying  the  algorism.  We  assumed  that  such 
faults  would  be  caused  either  by  delays  in  the  communications  network  or,  more  likely,  since  the 
network  itself  had  high  redundancy,  due  to  contention  delays  within  processors  attached  to  the  network. 
Faults  affecting  a  single  message  or  timer  were  considered  first,  followed  by  multiple  faults  of  the  same 
type  (for  example,  ARC  messages  received  late  [after  validation  time]  at  toth  processors  A  and  B)  and 
finally,  selected  combinations  of  faults  involving  distinct  message  and  timer  types.  Since  the 


*  View  difference  is  the  set  of  processors  that  are  not  in  both  views. 
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combinations  of  multiple  faults  was  practically  unlimited,  our  initial  effons  focused  on  algorithm 
correctness  for  basic  single  and  simple  multiple  faults. 

The  GM  algorithm  model  was  verified  by  replicating  the  results  of  three  fault  scenarios  that  had 
successfully  executed  in  the  laboratory  environment.  Matches  between  lab  and  model  results  provided 
evidence  that  the  riKxlel  correctly  represented  the  algorithm.  Matches  in  this  context  do  not  mean  that 
model  and  laboratory  results  were  sufficiendy  close  in  value  as  in  traditional  simulation  verification 
exercises  but  instead  that  the  membership  views  of  each  processor  as  projected  by  the  model  exactly 
matched  the  ccaresponding  view  observed  in  the  laboratory  test. 

Given  a  candidate  set  of  fault  scenarios,  the  model  was  modified  to  reflect  the  specific  fault(s)  being 
evaluated  and  then  executed.  The  effects  of  lost  or  late  messages  (and  timers)  were  modeled  by  using 
suitable  delay  values  to  represent  message  transmissions  or  periodic  timers.  Only  a  few  discrete 
transmission  delay  values  were  of  interest  in  the  modeling  runs,  since  an  algorithm  fault  is  not  triggered 
unless  a  message  is  received  after  an  arbitrary  but  fixed  delay  T.  Messages  arriving  before  time  T  are 
"on  time."  Messages  arriving  after  time  T  are  "late."  The  effect  of  an  on  time  message  is  independent  of 
its  exact  arrival  time.  Similarly,  the  effect  of  a  late  message  does  not  vary  with  the  degree  of  lateness.  In 
effect,  sampling  delay  times  was  not  necessary  which  simplified  the  analysis. 

Key  model  outputs  included  the  membership  views  of  each  group  member  and  a  detailed  trace  which 
annotated  the  specific  model  execution  and  proved  indispensable  in  verifying  that  fault  scenarios  had 
been  implemented  correctly.  Model  results  were  then  evaluated  to  detect  instances  in  which  1)  no  group 
member  survived,  2)  the  protocol  shut  down  multiple  good  (fault  free)  processors  in  order  to  preserve 
consistency,  and  3)  performance  could  be  improved  by  eliminating  redundant  messages.  Relative  to  item 
1,  a  fundamental  design  guideline  was  that  at  least  one  processor  should  always  survive.  Item  2  refers  to 
those  cases  where  the  protocol  behaves  correctly  but  in  a  less  than  optimal  (marginal)  manner.  Results 
indicating  potential  shoncomings  with  the  protocol  and  candidate  fixes  were  forwarded  to  the  protocol 
developers  for  resolution. 

4.  Detecting  a  Fault  in  the  GM  Protocol 

We  now  describe  a  specific  fault  scenario  in  which  model  results  led  to  a  modification  of  the  GM 
protocol.  Assume  that  processors  A,  B,  and  C  with  IDs  1,  2,  and  3,  respectively,  belong  to  a  group  and 
that  Processor  D  with  the  largest  ID  of  4  is  attempting  to  join  the  group.  Further  assume  that  the  AGCD 
messages  from  group  members  A,  B,  and  C  are  received  late  by  D  (i.e.,  received  after  validation  time). 
The  timeline  analysis  in  Figure  3  was  reconstructed  from  the  model  run.  The  following  discussion 
shows  that  a  perfectly  functioning  three  processor  ^oup  is  totally  disabled  when  a  fourth  processor 
anempts  to  join.  Although  the  GM  algorithm  meets  its  stated  requirements  by  maintaining  a  consistent 
membership  view,  the  end  result  is  far  from  optima). 

As  previously  described,  the  joining  processor  D  broadcasts  an  AJR  message  to  the  group  members  after 
synchronizing  its  RC  and  VAL  timers.  At  the  first  roll  call  after  transmitting  the  AJR  message  G^beled  1 
in  Figure  3a),  D  belongs  to  the  join  lists  of  A,  B,  C  and  D.  At  validation  time.  A,  B,  and  C  add  D  to 
their  view  which  is  now  ABCD,  update  the  membership  change  time,  and  issue  PMC  messages. 
Meanwhile,  the  AGCD  timer  (scheduled  to  expire  at  validation  time)  for  D  expires  and  D  believes  it  is  the 
first  processor  in  the  group  to  be  initialized  (since  no  AGO)  messages  have  arrived).  D  adds  itself  to  the 
^oup  (which  previously  was  empty),  records  the  membership  change  time,  and  has  view  D;  D  does  not 
issue  a  PMC,  since  it  was  not  an  existing  group  member.  D  ignores  the  PMC  messages  from  A,  B,  and 
C,  since  the  received  views  are  not  bener  than  its  own  (p>ABCD).  At  the  next  roll  call  time  (labelled  2), 
each  processor  broadcasts  an  ARC  message  and  receives  at  least  one  view  that  differs  from  its  own, 
resulting  in  another  round  of  PMC  messages.  A,  B,  and  C  each  transmit  one  such  PMC  while  D 
generates  three  PMCs,  one  for  each  received  view  that  differs  from  its  own.  The  membership  change 
time  is  the  same  (last  validation)  for  all  processors. 
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D  again  ignores  the  PMCs  from  A.  B,  and  C,  since  D's  view  is  still  better  than  the  other  processors' 
views.  Upon  receiving  D's  view  which  is  better  than  their  own  (D>ABCD)  and  a  subset  of  their  own, 
A,  B,  and  C  must  fail  the  processors  belonging  to  the  set  difference  ABCD-D  =  ABC.  According  to  the 
protocol,  processor  failure  includes  the  following  three  steps:  1)  remove  the  failed  processor  from  your 
membership  view,  2)  update  your  membership  change  time,  and  3)  issue  a  PMC  message.  This  process 
is  repeated  for  each  processor  to  be  failed.  Processor  A  fails  the  first  member  in  the  set  difference, 
namely  itself,  and  shuts  itself  down.  B  fails  processor  A,  updates  the  membership  change  time,  and 
issues  a  PMC  with  view  BCD.  B  then  shuts  down  the  second  processor  in  the  set  difference,  namely 
itself,  and  ceases  processing.  Similarly,  C  issues  PMCs  after  marking  A  and  B  down  with  views  BCD 
and  CD,  respectively,  and  then  fails  itself  during  the  third  pass  through  the  shutdown  processor  logic. 
These  incremental  steps  are  labeled  Rl,  R2,  and  R3  in  the  figure.  Because  the  PMCs  issued  by  B  and  C 
have  more  recent  change  times  than  processor  D,  D  will  shutdown  after  the  next  round  of  PMC 
messages  is  processed.  The  final  result  is  that  no  group  member  survives  the  join  process. 

This  scenario  demonstrates  that  a  processor  in  the  act  of  failing  should  not  issue  PMC  messages.  It  is 
interesting  to  note  that  if  the  group  consists  of  only  a  single  member  (a  scenario  previously  tested  in  he 
laboratory),  then  the  final  result  is  that  the  joining  processor  survives.  The  results  of  this  scenario 
together  with  others  highlights  what  is  a  common  experience;  converting  an  algorithm  which  works  in 
principle  to  one  which  functions  as  real  software  operating  on  real  hardware  often  requires  substantial 
effort. 

5.  Conclusions  ^ 

We  believe  that  this  type  of  modeling  constitutes  a  valuable  application  of  simulation.  The  model  was 
instrumental  in  identifying  cases  where  protocol  behavior  was  unacceptable  (no  processor  survived 
certain  faults)  or  where  behavior  was  correct  but  undesirable,  for  example,  under  heavy  workload 
conditions  where  loss  of  a  good  processor  could  be  critical.  These  cases  were  forwarded  to  the  developer 
for  evaluation.  Their  analysis  concurred  with  the  model’s  results  and  they  modified  the  protocol  to 
correct  the  implementation  errors.  These  corrections  were  incorporated  into  the  normal  incremental 
software  build  cycle  and  thus  prevented  problems  during  the  system  test  phase.  Modeling  of  the  GM 
protocol  also  heightened  our  awareness  of  the  difficulties  in  implementing  protocols  devised  in  a 
laboratory  environment  and  not  yet  extensively  tested  in  the  field. 
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1.0  introduction 

The  Systems  Simulation  Branch  (K51)  is  responsible  for  the 
design  and  programming  of  large  software  models,  which  are 
typically  composed  of  hundreds  of  C  and  Fortran  source  files.  The 
source,  header,  and  executable  files  are  made  available  to  users  in 
a  project  directory,  which  is  defined  under  a  common  branch 
directory.  During  project  development  and  maintenance  the 
developer  changes  at  least  one  file  under  his  account,  while  the 
remaining  files  are  used  from  the  project  directory.  These  changes 
are  referred  to  as  temporary  changes  because  the  changes  are  not 
being  applied  to  the  project  files.  Once  the  changes  are  tested 
and  approved,  they  are  ready  for  incorporation  into  the  formal 
project.  In  this  case,  the  updated  files  are  placed  back  in  the 
project  directory  and  permanent  changes  are  made.  The  changes  are 
permanent  because  they  are  incorporated  into  the  released  project. 

The  branch  chose  to  use  the  UNIX*  Source  Code  Control  System 
(8CC8)  for  configuration  management  of  the  source  files  and  the 
UNIX  make  utility  to  build  the  project  files.  The  UNIX  make 
utility  provides  a  powerful  tool  to  create  and  update  object, 
binary,  and  library  files,  make  accomplishes  this  by  comparing  the 
dates  of  the  prerequisite  files  with  the  target.  If  any 
prerequisite  file  is  found  to  be  more  recent  than  the  target, 
commands  are  executed  to  update  the  target. 

The  make  utility  must  be  given  a  series  of  dependencies  to 
determine  if  a  target  is  out  of  date.  Any  file  which  is  dependent 
on  another  must  be  explained  via  either  an  implicit  rule  (.c.o: 
means  a.o  is  dependent  on  a.c)  or  a  dependency  statement  (a.o:  a.h 
means  a.o  is  dependent  on  a.h).  The  latter  case  is  important  since 
the  dependencies  of  all  header  files  must  be  explicitly  stated. 
These  dependencies  are  usually  given  in  a  file  named  makefile. 

However,  make  is  difficult  to  use,  especially  given  the  number 
of  sources  and  the  permanent  /  temporary  changes :  permanent 
changes  use  all  sources,  headers  and  objects  in  a  known  directory 
structure,  while  temporary  changes  combine  some  files  in  the 
project  space  with  some  files  in  the  user's  space.  Additionally, 
temporary  changes  to  a  header  file  may  affect  files  which  the  user 
is  not  currently  editing,  thus  these  files  must  be  retrieved,  using 
8CC8,  into  the  user's  space  and  re-compiled. 

In  order  to  solve  these  problems  and  to  insulate  the 
developers  from  learning  make's  nuances,  K51  developed  a  set  of 
tools  (scripts)  to  automatically  generate  makefiles  capable  of 
handling  the  temporary  /  permanent  change  problem.  These  easy-to- 
use  tools  provide  a  powerful,  consistent  makefile  that  supports  the 
branch's  development  environment.  This  paper  describes  the  work 
performed  by  personnel  in  the  Systems  Simulation  Branch  (K51)  to 
enhance  the  use  of  make  utility  and  find  solutions  to  the  above 
problems.  Hence  it  is  referred  to  as  the  K51  Makefile. 


'UNIX  is  a  registered  trademark  of  AT&T. 
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2.0  Capabilitioo 


The  K51  Makefile  was  developed  to  give  users  an  easy 
interface  into  a  complex  UNIX  utility.  It  also  does  the  following: 

•  creates  a  consistent  makefile  for  all  projects. 

•  allows  target/dependency  files  to  reside  in  different 
directories. 

•  allows  permanent  changes  to  be  made  by  updating  project 
files. 

•  allows  temporary  changes  to  be  made  in  a  current  working 
directory  (cwd) ,  automatically  compiling  and/or  linking 
the  necessary  files  from  project  space. 

•  allows  library  files  to  be  linked  from  other 
projects/models . 

•  allows  simple,  consistent  targets  (temp,  perm,  etc.)  for 
all  projects. 

•  allows  the  capability  of  updating  a  driver  file  and 
library  file  within  the  same  project. 

•  allows  the  capability  of  creating  a  driver  file  on  the 
fly  in  a  cwd,  even  if  one  does  not  exist  in  project 
space . 

•  displays  a  concise  help  screen  when  make  is  entered 
without  a  target  name. 

•  supports  projects  with  FORTRAN,  C,  or  a  combination  of 
source  files. 

•  allows  overriding  of  makefile  macros  from  either  the  make 
command  line  or  by  UNIX  environment  variables. 

All  makefiles  and  scripts  used  by  the  K51  Makefile  can 
execute  in  either  a  Korn  or  Bourne  Shell. 

3.0  Makefile  Generation 

3.1  genmake  command 

A  utility  named  genmake  was  developed  to  generate  or 
update  a  project  makefile.  The  generated  makefile  is  divided  into 
three  sections: 

•  general  macros  including  those  defining  paths  to 
files,  temporary  directory,  project  name,  target 
name,  project  source  files,  and  project  driver 
files. 

•  language  macros  (C,  FORTRAN)  and  miscellaneous 
macros . 

•  command  which  includes  a  file  containing  all 
targets.  This  file  also  includes  a  file  of  all 
needed  rules. 
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3.2  8ouro«/Driv«r  Files 


The  major  purpose  of  geamake  is  to  add  a  user's  project 
file  names  to  a  makefile  template* 

Based  on  the  value  of  the  input  parameter  sretype 
(default  is  project)  and  gentype  (default  is  add) ,  file  names 
(f names  ...)  are  added  to/deleted  from  the  appropriate  macro  in  the 
makefile  as  a  function  of  the  file  extension.  The  K51  Makefile 
supports  the  following  extensions: 

. c  C  language 

.f  FORTRAN  with  no  C  Preprocessor  directives 

.for  FORTRAN  with  no  C  Preprocessor  directives 

.F  FORTRAN  with  C  Preprocessor  directives 

.pf  FORTRAN  with  C  Preprocessor  directives 

When  files  are  added  the  entire  list  of  files  is  sorted 
if  the  input  parameter  sort  is  set  to  yes  (the  default) . 

If  the  input  parameter  sretype  is  set  to  driver  then  the 
files  are  added/deleted  from  the  DR8RC  macro.  It  is  assumed  that 
one  or  more  of  these  files  are  combined  to  produce  a  driver  for  the 
project's  library  file  (.a  file)  either  in  project  space,  in  cwd, 
or  both.  / 


4.0  make  command 


After  generating  a  makefile,  targets  can  be  generated  or 
updated  using  the  make  command.  The  general  form  of  the  command 
used  with  the  K51  Makefile  is 

make  [makeflags]  [target]  [macro  definitions] 

Many  flags  are  available;  however,  only  a  few  are  most  commonly 
used: 


• 

-e 

• 

-f 

• 

( 

-n  1 

• 

-P 

The 

target  h 

• 

perm 

• 

temp 

• 

drivert 

allows  environment  variable  definitions  to 
override  the  makefile's  macro  definitions, 
specifies  description  file  (default  is  makefile 
or  Makefile) . 

displays  commands  but  does  not  execute  them, 
prints  the  complete  set  of  macro  definitions. 


Each  of  the  above  is  described  in  the  sections  that  follow. 

4.1  make  perm  Command 

The  purpose  of  the  make  perm  command  is  to  update  the  project 
files.  When  the  user  executes  the  make  perm  command  in  the  project 
directory  that  contains  makefile,  a  UNIX  shell  script  (permsept)  is 
invoked  with  all  needed  values  input  via  positional  parameters. 
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Project  dependency  information  is  defined  first  in  the  file 
makefile. dep,  which  is  included  in  the  user's  makefile.  This  is 
the  heart  of  the  K51  Makefile,  and  the  information  in  this  file 
must  be  correct  for  both  permanent  and  temporary  changes.  The  K51 
Makefile  can  accept  the  project  and  driver  source  files  either 
under  the  control  of  the  Source  Code  Control  System  (SCCS)  or  as 
ASCII  text  files.  The  dependency  file  (makefile. dep)  is  composed 
of  five  parts: 

•  header  information. 

•  source  dependency  information. 

•  source  object  files  list. 

•  driver  dependency  information  (optional) . 

•  driver  object  files  list  (optional) . 

The  header  information  contains  target,  project,  date,  and 
user  information.  To  create  file  dependencies  all  source  files 
must  be  processed  to  determine  all  references  to  header  (.h)  files. 

After  the  dependencies  are  found,  they  are  processed  based  on 
the  prerequisite  file's  extension  via  an  awk  script  and  placed  in 
the  dependency  file.  The  full  path  to  the  object  files  are  then 
listed  under  the  macro  OBJABS. 

If  the  project  co’^tains  a  driver,  driver  file  dependency 
information  is  appended  to  makefile. dep.  They  are  computed  like 
the  source  file  generation  ^.apendencies,  except  the  object  files 
are  listed  under  the  macro  OBJDRV. 

After  the  dependencies  are  computed,  the  object  files  and 
target  are  defined.  The  K51  Makefile  supports  the  generation  of 
both  a  binary  target  (all  object  files  are  linked  to  form  a 
executable  image)  and  a  library  target  (all  object  files  are  placed 
in  a  random  library) .  If  the  project  contains  a  driver  which  calls 
functions  from  a  library,  it  is  created/modified  after  the  target 
is  updated. 

4.2  make  temp  Command 

The  purpose  of  make  temp  is  to  update  local,  temporary  files 
without  changing  the  project  files.  The  project's  makefile  must  be 
present  in  the  cwd  to  make  the  temp  target.  When  the  user  executes 
the  make  temp  command,  a  UNIX  shell  script  (tempscpt)  is  invoked 
with  all  the  needed  values  input  via  positional  parameters. 

Since  it  is  possible  that  project  files  would  be  changed  if 
the  user  executes  make  with  a  makefile. dep  file  from  the  project 
directory,  a  test  is  made  to  prevent  this.  Also,  the  user  has  the 
ability  to  force  compilation  of  files  other  than  those  determined 
to  be  out  of  date.  This  is  useful  when  changes  to  compilation  or 
preprocessor  flags  are  desired. 

Dependencies  are  computed  in  the  i.::wd  and  placed  in  the  local 
makefile. dep  file.  This  contains  the  same  information  as  the 
project  makefile. dep,  except  targets  are  defined  to  produce  local 
versions  of  the  project  object /executable /library  files. 

Next  all  header  files  in  the  cwd  are  examined  to  determine  if 
any  project  source  files  are  dependent  on  them.  It  is  crucial  for 
this  to  be  done  so  that  any  change  to  a  local  header  file  will 
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force  compilation  of  a  project  source  file  upon  which  it  depends. 
The  project's  makefile.dep  file  is  used  to  determine  this. 

For  all  needed  source  files,  a  copy  of  the  file  is  placed  in 
a  temporary  directory  and  the  c  Preprocessor  is  used  to  determine 
all  dependencies.  Finally,  the  object  file  list  of  those  in  the 
cwd  is  appended  to  the  end  of  makefile.dep,  under  the  macro  OBJ^S. 
If  a  binary  target  is  generated,  the  full  path  to  the  object  files 
that  are  not  generated  in  the  cwd  are  listed,  too.  This  is  not 
needed  for  library  targets,  since  the  project  library  is  linked 
instead. 

If  the  project  contains  a  driver,  driver  file  dependency 
information  is  appended  to  the  makefile.dep  in  the  cwd. 

After  dependencies  are  computed,  the  object  files  and  target 
are  defined.  The  K51  Makefile  supports  the  generation  of  either  a 
binary  or  library  target  in  the  cwd.  The  target  is  dependent  on 
all  source  object  files  (either  in  the  cwd  or  project  space)  and 
the  optional,  external  libraries.  If  the  base  name  of  the  external 
library  is  in  the  cwd,  the  library  in  the  cwd  is  used  in  lieu  of 
the  external  library.  If  any  of  the  object  files  are  older  than 
the  corresponding  source  files  (either  in  cwd  or  project  space) , 
then  the  source  file  is  compiled  to  produce  the  object  file.  If 
any  of  the  object  files  or  the  external  libraries  are  newer  than 
the -target,  the  target  is  made  by  linking  the  object  files  with  the 
external  libraries,  or  by  combining  the  object  files  to  form  a 
library  in  the  cwd.  If  the  project  contains  a  driver  which  calls 
functions  from  a  library,  it  is  created/modified  after  the  target 
is  updated. 

4.3  make  drivert  Command 

The  purpose  of  this  target  is  to  create  an  executable  driver 
in  the  cwd.  Dependencies  do  not  need  to  be  defined  since  it  does 
not  use  the  makefile.dep  file.  The  user  must  have  the  appropriate 
driver  source  or  object  files  in  the  cwd  before  making  this  target. 

This  target  is  useful  for  users  who  wish  to  create  drivers  on 
the  fly,  using  the  project's  library  file.  Different  drivers  can 
be  created  for  different  purposes,  without  changing  the  project's 
files. 

5.0  Summary 

The  K51  Makefile  was  designed  to  enhance  the  effectiveness  of 
using  the  basic  UNIX  make  utility.  It  meets  the  needs  of 
programmers  who  wish  to  maintain  project  files,  test  temporary 
changes  without  changing  project  files,  and  use  the  powerful  make 
capabilities  without  the  inconvenience  of  having  to  deal  with 
creating  and  updating  the  makefile.  The  gexunake  utility  automates 
adding  and  deleting  source  file  names  from  the  makefile.dep  file. 
However,  the  user  is  still  able  to  edit  his  makefile  and  modify 
previously  defined  macros.  The  burden  of  dealing  with  the 
makefile's  syntax  and  keeping  the  dependencies  up  to  date  are 


262 


hidden  from  the  user.  As  long  as  the  macro  containing  the  source 
file  names  is  kept  up  to  date  (using  genmake) ,  files  upon  which  the 
source  files  are  dependent  are  automatically  generated  and  included 
in  the  makefile.  By  entering  simple  commands  (e.g.,'  make  perm  or 
make  temp) ,  the  user  can  quickly  update  his  project  files  or  test 
new  files  with  his  project  without  affecting  the  project  files. 
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Abstract 

Fixed  priority  scheduling  is  commonly  used  to  ensure  that  periodic  tasks  wiU  be  able  to  meet 
hard  real-time  deadlines.  There  are  two  major  approadies  by  which  we  can  guarantee  that  a  given 
taskset  can  be  w'iieduled  according  to  some  fixed  priority  asrignment:  utilization  bound  checks, 
which  check  the  total  expected  processor  utilization,  and  do  not  require  detailed  information  about 
the  taskset;  and  exact  schednlability  checks,  which  use  detailed  information.  In  this  work,  we  present 
a  technique  for  determining  period-specific  utilization  bounds,  which  use  task  period  information, 
generally  avmlable  at  derign  time,  but  not  task  computation  time  mformation,  which  is  hard  to 
determine  accurately.  The  technique  we  use  for  determining  the  bound  is  an  innovative  approach 
which  makes  use  of  linear  programming. 


1  Introduction 

Scheduling  hard  real-time  peric'^ic  tasks  using  system  utilization  bounds  was  first  developed  by  Liu 
and  Layland  [8].  This  technique  has  gained  popularity  as  an  approach  to  designing  predictable  real-time 
systems.  They  analyzed  the  rafe-monofontc  scheduling  algorithm,  in  which  higher  priorities  are  assigned 
to  tasks  with  shorter  periods.  They  showed  that  for  this  algorithm,  all  tasks  were  guaranteed  to  meet 
their  deadlines  provided  their  combined  utilization  of  the  processor  did  not  exceed  a  certain  level  (the 
worst  case  utilization  houni).  They  also  showed  that  this  algorithm  is  optimal  among  all  preemptive 
fixed  priority  assignment  algorithms  for  scheduling  periodic  tasks,  for  the  case  where  task  deadlines  are 
coincident  with  the  end  of  a  task’s  period. 

Subsequent  work  on  this  topic  has  brought  up  several  interesting  points; 

•  Lehoczky,  Sha  and  Ding  suggest  that  the  average  case  behavior  of  this  algorithm  is  substantially 
better  than  the  worst  case  behavior[6].  They  showed  that  the  behavior  of  this  algorithm  is  strongly 
dependent  upon  the  relative  values  of  the  periods  of  the  tasks  comprising  the  task  set.  This 
suggests  that  if  we  take  advantage  of  information  about  task  periods,  we  could  obtain  a  taskset- 
specific  utilization  bound  which  would  be  much  higher  than  the  general  bound  derived  by  Liu  and 
Layland. 

•  Leung  and  Whitehead  introduced  a  new  fixed  priority  scheduling  algorithm,  the  deailine^monotonic 
algorithm,  in  which  higher  priorities  are  assigned  to  tasks  with  shorter  deadlines[7].  They  proved 
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that  the  deadline-monotonic  algorithm  is  optimal  for  the  case  where  tasks  have  deadlines  that 
are  at  or  before  the  end  of  their  periods.  This  suggests  that  in  order  to  de^  with  more  general 
situations,  it  would  be  useful  to  be  able  to  derive  utilization  bounds  for  priority  assignments  other 
than  just  the  rate-monotone  priority  assignments. 

In  this  work,  we  develop  a  technique  to  determine  the  utilization  bound  for  a  specific  task  set,  where 
we  know  the  period  and  deadline  of  each  of  the  tasks,  but  we  may  not  know  the  task  computation  time. 
This  is  the  most  common  situation,  since  task  periods  and  deadlines  depend  on  the  characteristics  of 
the  application  and  are  usually  fixed  at  design-time,  whereas  computation  times  are  very  difficult  to 
determine,  even  after  the  actual  application  code  has  been  written.  Our  technique  is  applicable  to  any 
arbitrary  fixed  priority  assignment,  and  to  situations  where  some  tasks  must  be  complete  before  the  end 
of  their  periods,  in  which  case  the  Liu  and  Layland  bound  is  not  applicable. 

This  problem  fills  an  important  gap  between  the  worst  case  bound  which  does  not  take  any  task 
information  into  account,  and  exact  schedulability  tests  [6],  which  require  complete  information  about 
the  task  set.  This  problem  is  an  important  one  in  the  design  of  real-time  systems.  In  general,  the  arrival 
rate  of  each  periodic  task  is  fixed  at  design  time.  However,  it  is  difficult  to  determine  the  computation 
time  of  tasks;  the  computation  to  be  performed  may  vary  from  one  arrival  to  the  next  (it  may  be  data- 
dependent);  various  system  features,  such  as  interrupts,  cache  memory,  virtual  memory,  I/O,  message 
transmission  over  networks,  resource  availability  may  all  cause  variations  in  the  execution  time  of  tasks. 
This  difficulty  is  usually  overcome  by  determining  an  upper  bound  on  task  computation  times,  and 
scheduling  for  the  worst-case  situation  when  each  task  requires  this  maximum  time.  However,  it  may 
not  even  be  possible  to  determine  a  useful  worst-case  bound  in  many  applications,  such  as  radar  tracking, 
where  the  computation  time  depends  directly  on  the  number  of  objects  being  tracked.  Therefore,  it  would 
be  very  useful  to  determine  a  utilization  bound  which  takes  into  account  the  specific  task  periods,  but 
does  not  make  assumptions  about  task  computation  times.  Also,  a  utilization  bound  has  the  advantage 
over  an  exact  schedulability  criterion  that  it  can  be  used  to  perform  simple  schedulability  checks  at 
run-time,  when  tasks  may  have  transient  overloads. 

This  paper  is  organized  as  follows:  Section  2  develops  our  technique  for  determining  optimal  utiliza¬ 
tion  bounds  for  tasksets  with  known  periods.  Section  3  discusses  some  of  the  ways  in  which  our  results 
can  be  used.  In  section  4,  we  present  some  examples  and  simulation  results  to  show  utilization  bounds 
obtained  by  our  technique.  We  conclude  the  paper  in  section  5. 


2  Period-specific  Utilization  Bound 

In  this  section,  we  develop  a  technique  to  derive  the  utilization  bound  for  a  taskset  in  which  the  periods 
of  all  the  tasks  are  known,  under  a  scheduling  algorithm  based  on  fixed  task  priorities. 

2.1  System  Model  and  Objective 

We  consider  a  set  of  periodic  tasks  with  each  task  having  a  deadline  before  or  at  the  end  of  its  period. 
We  consider  preemptive  fixed  priority  assignment  where  each  task  receives  a  unique  priority,  and  higher 
priority  tasks  can  preempt  any  lower  priority  task.  Priority  assignment  is  done  at  design  time  before  any 
tasks  are  scheduled.  We  assume  that  task  periods  are  known  at  design  time,  however  task  computation 
times  are  not  known. 
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Let  be  the  taskset  sorted  in  the  decreasing  order  of  the  priorities  to  be  scheduled  by 

preemptive  fixed  priority  assignment.  Let  Ti.T}, .  ..,Tn  be  the  periods  of  the  tasks  of  the  taskset,  and 
let  .,Dn  be  the  deadlines  of  the  corresponding  tasks,  where  each  A'  <  Ti.  Let  Bm  be  the 

minimum  utilization  bound  that  guarantees  the  schedulability  of  Tm ,  and  let  Ci ,  C3, ....  Cm  be  a  set  of 
positive  computation  times  of  tasks  rx , . . . ,  Tm  that  achieves  that  bound.  We  will  only  analyze  the  case 
when  all  tasks  are  initialized  at  the  same  time,  since  Lehoezky  proved  that  this  is  a  critical  instance  of 
the  taskset.  That  is  the  worst  case  scenario  when  the  demand  for  computation  time  is  the  largest. 

Our  objective  is  to  determine  the  exact  Period-Specific  Utilization  Bound  (PSUB),  such  that  all  tasks 
are  guaranteed  to  meet  their  deadlines  if  their  combined  utilization  is  less  than  or  equal  to  this  bound, 
and  for  any  utilization  greater  than  this  bound  there  exist  a  possible  set  of  computation  times  for  which 
the  taskset  is  not  feasible. 


2.2  Determining  the  Period-Specific  Utilization  Bound 

We  propose  a  technique  based  on  linear  programming  to  determine  period-specific  utilization  bound. 


lemma  1  There  ts  no  idle  processor  time  prior  to  the  first  deadline  of  task  Tm- 

/ 

Proof:  Assume  that  there  is  a  j  idle  time  prior  to  the  first  deadline  of  the  task  Tm  ■  Then  even  with  the 
increase  of  computation  time  Cm  by  6  task  Tm  will  still  meet  its  deadline,  but  the  utilization  bound  Bm 
increases.  Hence,  this  violates  the  assumption  that  the  set  of  computation  times  achieves  the  minimum 
utilization  bound  Bm  ■  Since  the  first  deadline  of  Tm  is  the  critical  instance  of  the  task  Tm ,  the  lemma 
follows. 

□ 

Next  we  show  that  all  tasks  with  higher  priority  that  arrive  before  the  deadline  of  task  Tm  must 
complete  their  execution  before  the  deadline  of  task  Tm  in  order  for  Bm  to  be  minimal. 

lemma  2  All  tasks  with  higher  priority  that  arrive  before  the  deadline  of  task  Tm  must  be  finished  before 
the  deadline  of  task  Tm- 

Proof:  Assume  that  a  request  of  some  task  rj  with  priority  higher  than  Tm  (so  i  <  m)  finishes  after  the 
deadline  of  Tm  of  its  critical  instance.  Let  6  is  the  amount  of  computation  time  task  Ti  is  executed  after 
the  deadline  Dm  of  task  rm.  There  are  two  possible  cases. 

Case  1:  If  Ti  >  Dm,  then  since  the  task  r,-  is  still  executing  after  the  deadline  of  task  Tm,  there  is  no 
time  for  lower  priority  task  Tm  to  execute  at  all.  Hence,  task  Tm  does  not  has  any  computation  time  or 
the  taskset  is  nonschedulable.  In  both  cases  we  have  contradictions  to  the  assumptions. 

Case  2:  If  T,  <  Dm ,  then  we  create  a  different  set  of  computation  times  that  reduces  utilization  Bm 
but  is  still  schedulable.  We  simply  reduce  the  execution  time  of  n  by  6  and  increase  the  execution  time 
of  Tm  by  the  amount  of  time  freed  up  by  task  r,-.  So  new  computation  times  Q  and  C'm  of  tasks  n  and 
Tm  becomes, 


Cj  =  C,-< 
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C  =  c«  +  «xL^j. 

This  transfonnation  preserves  the  schedulability  of  the  system.  Hence,  the  new  utilization  of  the 
system  =  Bm  -t-  ~  Since  the  term  in  the  last  parentheses  is  not  positive,  new 

utilization  can  only  be  smaller  or  ^ual  to  the  original  one.  This  violates  the  assumption  that  Bm  is 
minimal  and  the  lemma  follows. 

□ 


Corollary  1  YZi  x  r%^l  = 

As  the  consequence  of  the  above  we  have  the  following  theorem. 


Theorem  1  The  utilization  bound  Bm  for  the  achedulable  task  Tm  can  be  achieved  by  the  following 
procedure: 


Minimize 


subject  to 


7; 


i=l 


for  l<i<m-l  and  1</<L^J,  x  >  /  x  T^, 


and 


and 


X;c.xr^i =£>„., 


isl 


for  1  <  i  <  m,  Ci  >  0 


Proof:  The  first  set  of  equations  expresses  the  constraint  that  there  should  be  no  idle  time  in  the  system, 
as  proved  in  lemma  1.  The  second  equation  expresses  the  constraint  of  lemma  2,  that  there  should  be  no 
overflow  at  the  time  Dm-  The  third  equation  simply  constrmns  execution  times  to  be  positive.  Thus  this 
system  of  inequalities  determines  the  worst-case  combination  of  computation  times  which  meets  these 
constraints,  and  has  the  lowest  combined  utilization.  □ 

Notes: 


•  We  do  not  need  any  additional  constraint  to  express  the  idea  that  Tm  meets  its  deadline;  it  is 
automatically  ensured  by  lemma  2. 

•  Though  these  inequalities  seem  complicated,  in  fact  they  are  identical,  but  opposite  to  the  inequali¬ 
ties  in  the  exact  schedulability  condition  developed  by  the  Leboczky,  Sha  and  Ding  [5]  i.e.  the  exact 
schedulability  criterion  has  <  where  we  have  >  symbols.  This  is  because  we  are  expressing  the 
idea  that  the  task  arrivals  will  at  least  consume  all  the  available  time  (no  idle  time),  whereas  they 
are  ensuring  that  all  the  arrivals  will  complete  within  that  time,  ensuring  scheduling  feasibility. 
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•  This  bound  merely  ensures  that  will  meet  its  deadline  as  long  as  the  system  utilisation  does 
not  exceed  this  bound.  It  does  not  give  similar  assurances  for  the  other  tasks.  We  obtain  this 
assurance  by  iterating  this  procedure  over  all  n. 

Theorem  2  The  period-specifie  uUluaiion  bound  PSUB  for  ike  system  ts  ike  emallesi  Bi  for  tack  task 
Ti  such  ikai 


PSU  B  =  min  Bm 

mal 

Proof:  Since  PSUB  <  Bi  for  all  1  <  t  <  n,  all  tasks  Ti  are  guaranteed  to  meet  their  deadlines.  The 
bound  is  tight,  since  there  exists  some  task  n  for  1  <  k  <  n  such  that  PSU  B  =  B^.  By  theorem  1,  this 
task  n  does  not  have  any  idle  time.  □ 

Hence,  we  can  find  the  PSUB  by  solving  n  linear  programming  problems  to  determine  the  utilization 
bound  Bm  for  each  task. 


2.3  Discussion 

/ 

The  technique  we  have  outlined  above  is  significant  because  it  enables  us  to  obtain  a  utilization  bound 
test  for  any  arbitrary  fixed  priority  assignment  algorithm.  Moreover,  for  the  special  case  of  the  rate- 
monotonic  scheduling,  it  gives  us  a  higher  (tighter)  bound  for  specific  tasksets  than  the  more  general 
Liu  and  Layland  bound. 

In  addition  to  this,  there  is  another,  broader  significance  to  this  technique.  To  date,  analytical  results 
about  the  rate-monotonic  and  other  fixed  priority  scheduling  algorithms  have  followed  the  techniques 
introduced  by  Liu  and  Layland,  and  developed  further  by  Lehoczky,  Sha  and  the  Carnegie-Mellon  group, 
of  identifying  and  analyzing  specific  worst-case  situations.  A  significant  aspect  of  our  result  is  that  it 
opens  up  an  alternative  door  to  analyzing  the  behavior  of  the  system,  based  on  using  optimization 
techniques  to  identify  worst-case  behavior.  The  general  approach  of  setting  up  constraint  equations  to 
model  system  parameters,  and  using  optimization  to  derive  boundary  conditions,  is  a  highly  extendible 
one,  and  we  are  already  currently  using  it  to  solve  other  similar  problems  of  interest.  We  are  very 
optimistic  about  its  potential  as  an  alternative  analytical  tool. 

There  are  several  different  algorithms  for  solving  linear  programming  [2,  4,  3].  The  best  of  them 
gives  polynomial  algorithms  such  as  [4,  3].  However,  the  potentially  high  time  complexity  of  linear 
programming  is  not  a  problem  for  our  application,  since  we  are  determining  utilization  bounds  off-line, 
at  system  design  time. 


3  Applications  of  the  result 

There  are  many  different  ways  in  which  this  result  can  be  applied: 

•  For  rate-monotonic  scheduling,  if  we  have  a  taskset  with  known  periods,  we  can  derive  a  utilization 
bound  and  use  it  to  check  for  scheduling  feasibility,  even  if  exact  computation  time  bounds  are  not 
available.  This  technique  can  also  handle  some  situations  where  tasks  must  be  completed  before 
the  end  of  their  period,  in  which  case  the  Liu  and  Layland  bound  is  not  applicable. 
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•  Occasionally,  we  may  wish  to  assign  non-rate-monotonic  priorities  to  some  tasks;  for  example,  we 
may  wish  to  raise  the  priority  of  some  low-frequency,  high-criticality  task  to  ensure  that  it  will 
not  miss  the  deadline  even  in  overload  situations.  We  can  use  this  technique  to  obtain  utilisation 
bounds  and  guarantee  schedulability  in  these  situations. 

•  System  designers  may  have  flexibility  in  choosing  task  periods.  For  many  applications,  such  as 
monitoring  and  data  acquisition,  the  requirement  is  simply  to  'sample  at  least  every  25  seconds” 
etc.  In  these  cases,  system  designers  may  be  able  to  adjust  task  frequencies  in  such  a  way  that 
the  utilization  bound  is  increased  (select  harmonic  frequencies).  For  example,  if  other  tasks  in 
the  system  have  a  period  of  12,  choosing  a  period  of  24  may  lead  to  a  much  higher  utilisation 
bound  than  choosing  period  to  be  25.  Thus,  contrary  to  intuition,  we  may  be  able  to  get  better 
schedulability  while  also  increasing  system  performance.  We  develop  this  theme  further  in  examples 
in  the  next  section,  and  analytics!  results  on  tk^  topic  can  be  found  in  [9]. 

•  In  some  cases,  we  may  have  some  (incomplete)  information  about  computation  times.  For  example, 
we  may  know  the  minimum  computation  time  for  some  of  the  tasks,  and  perhaps  exact  computation 
times  for  some  others  [1].  By  modifying  the  constraint  equations  appropriately,  it  is  possible 
to  derive  progressively  better  bounds  as  more  information  is  known  about  the  system,  thereby 
increasing  the  range  of  systems  which  can  be  guaranteed. 

•  In  addition  to  these  design-time  applications  of  the  bound,  it  can  also  be  used  to  check  schedula¬ 
bility  at  run-time.  Sometimes,  the  computation  times  of  tasks  may  be  known  only  at  task  arrival 
time  when  the  input  data  size  is  known  (e.g.  radar  tracking,  where  the  processing  time  may  depend 
on  number  of  targets).  Utilization  bounds  can  be  used  to  check  schedulability  by  keeping  track  of 
the  combined  processor  utilization  of  all  the  tasks  in  the  system. 

•  The  individual  task  utilization  bounds  can  themselves  be  useful  to  detect  which  tasks  may  overrun 
in  particular  overload  situations.  Careful  design  can  also  ensure  that  critical  tasks  are  less  likely 
to  miss  iheir  deadlines,  by  adjusting  priorities  so  that  their  task  utilization  bounds  are  not  the 
critical  ones. 


4  Examples  and  simulation 

We  present  some  examples  and  the  result  of  simulation  in  this  section. 

Table  1  shows  the  concept  of  individual  task  utilization  bounds.  Ihsk  2  has  a  bound  of  83.33%.  This 
bound  occurs  when  the  computation  times  of  task  1  and  task  2  are  100  and  200  respectively.  Similarly, 
we  show  the  bounds  for  tasks  3  and  4,  and  the  corresponding  worst-case  computation  times.  Of  course, 
the  computation  times  of  lower  priority  tasks  are  irrelevant  in  determining  the  bound  for  higher  priority 
tasks.  The  table  illustrates  that  task  4  is  safe  as  long  as  the  system  utilization  is  below  98.37%.  The 
task  2  may  be  threatened  if  utilization  of  rx  and  exceeds  83.33%.  Also,  task  may  be  threatened 
if  combined  utilization  of  rx,  and  exceeds  83.07%.  Thus,  all  tasks  are  guaranteed  to  meet  their 
deadlines  if  their  combined  processor  utilization  is  equal  to  or  less  than  83.07%. 

Table  2  shows  the  results  ofsome  simulation  studies  for  determining  period-specific  utilization  bounds 
for  randomly  generated  tasksets.  Task  periods  were  uniformly  distributed  between  20  and  2400.  The 
table  shows  the  minimum,  average,  and  maximum  PSUB  for  the  several  tasksets  generated.  It  should  be 
noted  that  uniform  distributions  are  not  likely  to  produce  tasksets  with  harmonic  frequencies,  particularly 
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for  large  taskseU.  In  contrast,  in  real  systems,  frequencies  are  multiples  of  some  basic  clock  cycle,  hence 
th^  do  tend  to  be  harmonic  and  clustered. 
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Thble  1.  An  example  illustrating  Optimal  Br  and  their  Cr  in  the  critical  zone 
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PSUB 
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Table  2.  Simulation  Result :  Period  Range  20  •  2400 


5  Conclusion 

Our  technique  fills  an  important  gap  between  the  worst-case  schedulability  test  which  does  not  take  any 
task  information  into  account,  and  exact  schedulability  tests,  which  require  complete  information  about 
the  taskset.  System  designers  can  use  this  technique  to  obtain  a  better  check  of  scheduling  feasibility. 
An  additional  major  benefit  of  our  technique  is  that  it  opens  up  an  alternative  approa^  based  on 
linear  programming  to  analyzing  the  behavior  of  fixed-priority  scheduling.  We  are  optimistic  about  the 
potential  of  this  approach  because  of  its  easy  extensibility  to  a  variety  of  scheduling  problems. 
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Abstract 

Unog  a  stochastic  coatrol  approach,  we  address  the  combined  prob¬ 
lem  of  schednling  both  periodic  tasks  and  inter-task  messages  in  dis¬ 
tributed  real-time  systems. 

First,  the  concurrent  execution  of  tasks  and  processing  of  messages 
are  modeled  as  a  sequence  of  continuous-time  Markov  chains.  Then,  the 
combined  task  and  message  scheduling  problem  (TMSP)  is  formulated 
as  a  Maikov  dedrion  process  to  minimixe  the  expected  number  of  tasks 
missing  deadlines.  Both  centralised  and  decentralised  solutions  to  the 
TMSP  are  derived  using  the  dynamic  programming  technique. 

For  the  centralised  case  the  global,  up-to-date  information  on  the 
execution  of  tasks  and  processing  of  measles  at  each  computing  node 
(CN)  is  assumed  available  to  all  other  CNs  so  that  an  optimal  schedul¬ 
ing  decision  can  be  made.  For  the  decentralised  case,  however,  each 
CN  makes  schednling  decirions  using  only  its  own  local  information 
and  other  CNs’  information  which  are  periodically  broadcast.  Illut- 
trative  examples  are  presented  and  optimal  broadcast  frequencies  are 
determined. 

Index  Terms  —  CentraUsed/decentralised  task  schednling,  task  dead¬ 
lines,  Markov  decision  proceu,  one-step  delayed  sharing  information 
pattern,  probalnlity  state,  stochastic  control,  dynamic  programming. 
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1  Introduction 


In  &  real-time  system,  the  normal  workload  is  composed  of  a  set  of  periodic 
tasks,  which  is  known  a  priori  and  usually  pre-assigned  to  the  computing  nodes 
(CNs)  of  the  distributed  system  for  execution.  Generally,  these  tasks  commu¬ 
nicate  with  one  another  to  accomplish  the  overall  mission,  and  inter-task  com¬ 
munications  introduce  precedence  relations  among  the  corresponding  parts, 
called  activities,  of  the  communicating  tasks.  Owing  to  its  data-dependent 
conditional  branches  and  loops,  an  activity  usually  takes  a  random  amount  of 
time  to  complete. 

The  main  objective  of  this  paper  is  to  formulate  and  solve  the  problem  of 
scheduling  both  periodic  tasks  and  inter-task  messages  in  a  distributed  real¬ 
time  system  such  that  the  long-term  expected  number  of  periodic  tasks  missing 
deadlines  is  minimized. 

Our  combined  task  and  message  scheduling  problem  (TMSP)  deals  with 
each  CN’s  decision  on  the  execution  of  its  periodic  tasks  as  follows: 

Dl.  Which  of  ready  activities  in  a  ON  should  be  executed  next  if  the  number 
of  free  processors  at  the  CN  is  less  than  that  of  ready  activities? 

D2.  Which  of  the  messages  arrived  at  a  CN  must  be  processed  next? 

D3.  While  waiting  for  a  specific  message  to  arrive,  the  CN  either  continues 
to  wait  for  the  message  or  abandons  the  waiting  and  executes,  instead, 
a  certain  default  activity. 

Dl  and  D2  are  typical  problems  addressed  in  the  scheduling  domain,  while 
D3  needs  further  elaboration.  As  mentioned  before,  periodic  tasks  commu¬ 
nicate  with  one  another  for  synchronization  and  information  exchange.  The 
communicating  partners  involved  are  usually  blocked  (e.g.,  [14])  until  the  com¬ 
munication  is  completed,  meaning  that  no  activities  requiring  the  information 
are  allowed  to  continue.  This  blocking  communication  scheme  could  result 
in  a  situation  where  one  or  more  of  the  communicating  partners  are  delayed 
indefinitely  (until  the  end  of  their  period)  waiting  for  a  message  which  may 
never  arrive.  This  could  be  due  to,  for  instance,  the  failure  of  the  sending  CN 
or  a  communication  link  or  termination  of  the  replying  task.  To  ensure  the 
timely  completion  of  each  task,  there  must  be  a  provision  for  a  communicating 
partner  to  carry  on  its  execution  even  in  the  absence  of  the  requested  infor¬ 
mation.  Of  course,  to  compensate  for  the  missing  information,  some  form  of 
default  activity  should  be  invoked.  This  default  activity  introduces  an  extra 
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computation  load  to  the  corresponding  CN.  It  is  D3  that  deals  with  the  deci¬ 
sion  on  whether  to  wait  for  an  outstanding  message  or  to  immediately  execute 
the  default  activity  and  use  the  execution  results  instead. 

Notice  that  D3  can  equally  well  be  interpreted  as  a  CN’s  decision  between 
two  tasks.  That  is,  while  executing  a  task,  called  a  primary,  the  CN  either 
continues  to  execute  the  primary  or  abandons  the  primary  and  executes,  in¬ 
stead,  a  certain  replacement  task.  For  instance,  we  may  use  the  fifth  control 
law  as  the  primary  and  the  third  control  law,  which  is  less  accurate  but  needs 
less  time  to  complete,  as  its  replacement.  D1-D3  are  actually  three  depen¬ 
dent  parts  of  a  single  problem,  since  neither  of  them  can  be  solved  without 
considering  the  others. 

For  a  set  of  independent  periodic  tasks  each  with  fixed  execution  times,  the 
rate  monotonic  scheduling  algorithm  [7]  has  been  rigorously  studied  [15,  8]. 
Liu  et  at.  [9]  considered  using  imprecise  computation  results  as  long  as  the 
mandatory  part  of  the  task  meets  its  deadline.  To  the  best  of  our  knowledge, 
however,  the  TMSP,  which  includes  dependent  periodic  tasks  and  random  task 
execution  times,  has  not  been  addressed  in  the  literature,  perhaps  because  of 
the  difficulty  associated  with  it  [12]. 

Based  on  the  continuous- time  Markov  chain  (CTMC)  model  of  task  ex¬ 
ecutions,  we  will  first  transform  the  TMSP  into  a  Markov  decision  process 
(MDP)  [13].  The  MDP  is  then  solved  with  the  dynamic  programming  (DP) 
technique  [2].  As  described  in  [11],  the  CTMC  model  is  built  typically  via  the 
construction  of  a  generalized  stochastic  Petri  Net  (GSPN)  [10]. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2,  we  show  how  the 
notion  of  default  activity  is  modeled  in  the  GSPN,  and  then  describe  briefly 
the  method  of  [11]  to  build  the  underlined  CTMC  model.  The  centralized 
TMSP  is  formulated  and  solved  in  Section  3.  In  Section  4,  we  solve  the  de¬ 
centralized  TMSP  for  which  each  CN  periodically  broadcasts  its  local  state  to 
other  CNs.  Using  this  solution,  we  also  identify  the  optimal  frequency  of  state 
broadcasts.  For  both  centralized  and  decentralized  TMSPs,  the  computational 
complexities  of  the  proposed  solution  algorithms  are  also  addressed  briefly  in 
Sections  3  and  4.  Finally,  concluding  remarks  are  made  in  Section  5. 

2  The  System  CTMC  Model 

Since  the  state-space  approach  is  to  be  used  to  the  TMSP,  we  need  to  show 
first  how  the  system  CTMC  model  is  constructed.  We  begin  with  the  GSPN 
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modd  of  the  de&ult  activity.  Then,  a  typical  method  of  [11]  to  construct  the 
CTMC  model  is  briefly  described. 

2.1  GSPN  Model  of  Default  Activity 

A  default  activity  is  extra  work  that  a  CN  can  choose  to  perform  instead  of 
waiting  for  an  outstanding  message.  To  ensure  not  to  waste  the  computing 
resource  in  a  CN,  default  activities  are  implemented  as  follows.  While  waiting 
for  an  outstanding  message  x,  the  CN  will  execute  all  ready  activities  including 
the  default  activity  Rg  provided  that  there  are  enough  free  processors  in  the 
CN.  Otherwise,  the  CN  may  or  may  not  choose  to  execute  Rx.  Then,  the  CN’s 
next  action  will  depend  on  the  following  two  cases: 

Cl.  Rx  is  started  and  completed  before  x  arrives.  The  CN  uses  the  results 
of  ^and  proceeds  as  if  the  precedence  constraints  imposed  by  x  were 
met.  No  message  is  sent  back  to  unblock  the  communicating  partner 
&om  which  x  was  originated. 

C2.  X  arrives  before  completing  (starting)  Rx-  The  CN  stops  executing  (start¬ 
ing)  /2i,  immediately  processes  x  and,  if  needed,  sends  a  message  back 
to  unblock  its  communicating  partner. 

The  GSPN  model  of  a  default  activity  is  shown  in  Fig.  1,  where  the  typical 
communication  primitives  SEND-RECEIVE-REPLY  (Fig.  1(a))  and  QUERY- 
RESPONSE  (Fig.  1(b))  have  been  used.  Notice  first  that  a  “control  place” 
has  been  created  in  the  sending  (querying)  side  to  assure  the  CN  be  unblocked 
by  either  receiving  r  (<)  or  completing  Rt  (Rt),  but  not  both.  Second,  un¬ 
like  SEND-RECEIVE-REPLY,  the  task  being  queried  is  not  blocked  by  the 
QUERY  request  (Fig.  1(b)).  Third,  if  the  receiving  CN  in  Fig.  1(a)  gives  up  on 
waiting  for  message  s  and  completes  the  default  activity  R,,  then  the  sending 
CN  will  eventually  be  forced  not  to  wait  for  the  associated  reply  mnsage  r, 
which  will  never  arrive,  by  executing  the  default  activity  Rr.  On  the  other 
hand,  if  message  s  arrives  before  R,  is  completed  then,  from  C2,  the  receiving 
CN  must  switch  immediately  to  processing  s  provided  there  is  a  free  processor. 

2.2  Constructing  the  System  CTMC  Model 

Consider  a  distributed  real-time  system,  where  the  tasks  have  been  pre-assigned 
among  the  set  {Nk  :  k  =  1,  2,  •  •  •  m)  of  m  CNs.  Since  each  task  repeats  itself 
at  regular  intervals,  it  is  sufficient  to  solve  the  TMSP  within  the  planning  cycle 
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I  —  [0,  X)  only,  where  X,  the  length  of  /,  is  the  least  common  multiple  of  all 
task  periods.  For  simplicity,  we  assume  that  each  task  is  invoked  simultane¬ 
ously  at  the  beginning  of  I  and  must  be  completed  by  its  next  invocation  time; 
otherwise,  it  will  simply  be  discarded.  For  completeness,  a  typical  method  [11] 
to  construct  the  system  CTMC  model  is  briefly  reviewed  in  what  follows. 

The  CTMC  model  to  be  built  is  described  by  its  state  space  and  state  tran¬ 
sitions.  Each  state  represents  a  particular  stage  of  the  concurrent  executions 
of  tasks  on  all  CNs,  whereas  a  transition  represents  either  a  task  invocation 
{time-driven  transition)  or  the  completion  of  an  ixtivity  {event-driven  tran¬ 
sition).  Consequently,  the  resulting  CTMC  model  is  composed  of  a  sequence 
of  /  CTMCs,  where  /  is  the  number  of  distinct  task  invocation  instants  in  /. 
Within  each  individual  CTMC,  the  system  evolution  is  determined  only  by 
the  event-driven  transitions  representing  the  completions  of  activities. 

The  CTMC  modeling  procedures  are  summarized  as: 

•  To  alleviate  the  problem  of  state  space  explosion,  contiguous  stretches  of 
executable  codes  are  combined,  whenever  possible,  to  build  the  smallest 
number  of  activities  while  preserving  all  precedence  constraints  among 
tasks  and  the  expected  task  execution  times. 

•  The  GSPN  is  used  to  model  the  concurrent  execution  of  activities  and 
their  precedence  constraints.  The  resulting  GSPN  model  is  then  trans¬ 
formed  into  a  sequence  of  CTMCs  by  perfonmug  reachability  analysis 
on  the  GSPN,  and  replacing  each  firing  delay  (event-driven  transition) 
vrith  an  appropriate  exponentially  distributed  random  variable. 

Suppose  the  tasks  are  invoked  at  time  instants  0  =  u>i  <  u>2  <  •  •  •  <  to/  <  X, 
The  resultant  CTMC  model  can  thus  be  described  formally  as  {(5u,  A^,  0^)  : 
u  =  1,  2,  *  ■  • ,  /},  where  Su  is  the  set  of  states  reachable  within  time  interval  /„ 
=  [u>„,  tou+i),  and  A„  :  Su  x  Su  — »  T  is  the  event-driven  transition  function 
between  any  two  states  in  S„,  where  T  is  the  set  of  all  event-driven  transitions. 
We  use  a,j  to  denote  the  activity  representing  the  transition  from  state  Si  to 

in  Su,  and  use  fiij  to  represent  the  transition  rate  (i.e.,  the  execution  rate) 
of  aij.  Finally,  -*  Su+i  is  the  time-driven  transition  function,  which 

specifies  for  each  state  in  Su  a  particular  state  in  Su+i  the  system  will  be  in 
when  a  task  is  invoked  at  tOu+i. 

For  example,  consider  the  simplified  task  system  in  Fig.  2  to  be  used 
throughout  the  rest  of  the  paper.  It  shows  a  GSPN  model  where  three  pe¬ 
riodic  tasks  7\,  T2  and  T3  are  pre-assigned  to  Ni,  TVj  and  Ns,  respectively. 
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Ti,  Tj  and  T3  with  periods  5,  10  and  5  are  invoked  twice,  once  and  twice, 
respectively,  within  the  planning  cycle  I  =  [0,  10).  Ti  queries  T2  for  informa¬ 
tion,  whereas  Ti  communicates  with  Ti  for  synchronization.  Suppose  default 
activities,  labeled  as  014  and  uis  in  Fig.  2,  are  provided  only  for  Ti  related  to 
TiS  response;  no  default  activities  are  provided  for  synchronization  between 
Ti  and  Ti.  Branch  conditions  follow  activities  as  and  an.  To  guarantee  the 
successful  synchronization  between  Ti  and  Ti,  we  assume  these  two  branches 
are  identical. 

At  the  beginning  of  1,  a  token  is  generated  in  the  place  (thus,  ^4 
and  ^s),  <i>ii  (^0  and  ^23)  and  ^27*  At  any  time  t  €  (0,  5),  states  evolve 
based  on  event-driven  transition  firings  only.  At  t  =  5,  when  Ti  and  Ti  are 
invoked  again,  the  marking  of  the  GSPN  is  determined  as  follows:  1)  a  token 
is  generated  in  ^10  (thus,  ^13  and  (^14)  and  ^3,  2)  all  tokens  in  and 

are  removed  to  discard  the  unfinished  first  invocations  of  Ti  and  T3, 

/ 

and  3)  the  tokens  in  ^19-^26  are  still  determined  by  event-driven  transition 
firings.  At  t  €  (5,  10),  the  state  evolution  is  again  determined  by  event- 
driven  transition  firings  until  t  =  10,  when  the  system  repeats  itself  for  the 

next  planning  cycle.  Let  =s  /i3  =  /i4  =s  /ie  s  nn  =  2,  fii  =  (it  =  4, 

(I7  =  (i\<i  =  (i\\  =  fii2  =  Mis  =  Mi4  =  1,  Ms  =  6,  Ms  =  4,  and  branching 

probabilities  p  =  2/3,  where  Mi  is  the  execution  (i.e.,  transition)  rate  of 

a  generic  activity  Oj.  After  performing  reachability  analysis  on  the  GSPN, 
a  total  of  90  (108)  states  are  generated  within  Ii  =  [0,  5)  {h  =  [5,  10)). 
Appendix  A  lists  all  the  states  where  Ni  needs  to  make  a  scheduling  decision 
because  more  than  one  scheduling  option  is  available. 

From  the  brief  descriptions  above,  it  can  be  seen  that  the  CTMC  model 
fully  describes  the  behavior  of  the  system  under  all  scheduling  decisions  of 
the  CNs,  each  of  which  may  even  have  any  number  of  processors.  Based  on 
this  model,  it  is  our  purpose  to  derive  the  optimal  scheduling  decision  for  each 
CN  such  that  the  expected  number  of  tasks  (invocations)  missing  deadlines  is 
minimized. 

In  the  centralized  case  to  be  addressed  in  the  next  section,  we  assume  that 
each  global  system  state  s,  €  5  =  ULsi  is  available  to  all  CNs.  To  facilitate 
the  analysis  of  the  decentralized  TMSP  in  Section  4,  the  local  state  available 
to  the  individual  CN  needs  to  be  identified  first.  This  local  state  is  embedded 
in  the  ^obal  state  Si  and  can  be  defined  by  NkS  local  state  space  5^  within 
/u.  5*  is  constructed  by  first  identifying  those  GSPN  places  associated  with 
Nk  only  and  then  selecting  the  markings  of  such  places  from  each  state  in  Su. 
An  element  in  5^  thus  represents  the  information  available  to  Nk  only.  For 
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example,  if  Si  denotes  the  global  state  "014  has  been  completed  on  Ni,  cj  has 
been  completed  on  Nj,  but  delay  uio  has  not  been  completed”,  then  Ni*s  local 
state  corresponding  to  s,  denotes  "014  has  been  completed  on  Ni”  only.  For 
more  details,  the  readers  are  referred  to  [11]. 

3  Optimal  Centralized  TMSP 

By  discretizing  the  planning  cycle  I  into  a  large  number  of  small  intervals, 
the  TMSP  becomes  a  problem  of  solving  the  Markov  decision  process  (MDP) 
embedded  in  the  system  CTMC  model.  The  technique  of  dynamic  program¬ 
ming  (DP)  is  to  be  used  to  solve  the  MDP.  Although  the  DP  is  a  well-known 
technique,  the  main  issues  of  solving  the  MDP  with  the  technique  lies  in  the 
correct  identification  of  the  following  characterizing  elements  of  the  MDP:  de¬ 
cision  set,  pne-step  transition  probability,  and  one-step  cost  so  that  functional 
equations  can  be  established  for  an  optimal  solution  (see,  e.g.,  [2]).  Assume 
the  planning  cycle  I  has  been  divided  into  V  1  equally  spaced  intervals 
each  of  length  h  s  LfV,  and  each  task  invocation  always  occurs  at  an  epoch 
defined  as  the  left  boundary  of  an  interval. 

3.1  Decision  Set 

The  decision  set  Di  for  state  s,  €  5  is  the  set  of  all  scheduling  options  available 
when  the  system  is  in  s,.  is  determined  by  combining  (over  all  CNs)  the  set 
of  scheduling  options  available  to  each  individual  CN.  For  example,  if  Ni  has 
2  scheduling  options  and  N2  has  3  in  s„  then  Z7,  contains  a  total  of  2  x  3  =  6 
scheduling  options.  A  policy  tt  specifies  which  activities  to  choose  for  each  CN 
in  each  state  Sj  €  S  at  each  epoch  v,  0  <  u  <  V  —  1.  The  policy  space  II  is 
the  set  of  all  such  x’s.  For  convenience,  the  set  of  activities  chosen  by  all  CNs 
under  x  in  state  s,  at  epoch  v  is  denoted  by  (If  (v). 

3.2  One-Step  IVansition  Probability 

The  epochs  serve  as  the  stages  of  the  corresponding  DP  network  [2],  i.e.,  tran¬ 
sitions  occur  only  between  states  of  adjacent  epochs.  Let  Pfj  {v)  denote  the 
one-step  transition  probability  from  state  s,  at  epoch  v  €  /u  to  Sj  at  epoch 
0  + 1  under  policy  x.  Also,  let  Z,  be  the  total  transition  rates  of  all  transitions 
in  CIJ  (v),  and  A,  be  the  set  of  all  destination  states  of  transitions  in  D”  (u), 
i.e..  A,  =  {sj  :  Au{si,  Sj)  €  H*  (v)}.  Depending  on  whether  or  not  a  task  is 
invoked  at  epoch  t;  +  1,  (v)  is  determined  as  follows. 
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PI:  >  (v  +  1)A  (epoch  when  no  task  is  invoked): 

% 

{Hij  h  if  Sj  €  Ai  and  j 

ifi  =  t  (1) 

0  otherwise, 

because  the  execution  time  of  each  activity  is  assumed  to  be  exponen¬ 
tially  distributed.  (/x,-j  is  the  transition  rate  from  Si  to  Sj.) 

P2:  =  (v  *f  l)h  (epoch  when  at  least  a  task  is  invoked): 

The  transition  probability  in  thu  case  is  similar  to  that  of  PI  except 
that  the  time-driven  transition  function  @u  is  fired.  Specifically, 

pri('')=  r  PiM,  (2) 

«s€B> 

where  jBj  =  {s,  :  0u(sx)  =  5j),  and  P,^  (v)  is  the  transition  probability 
obtained  from  PI  above. 


3.3  One-Step  Cost 


To  derive  the  one-step  cost,  we  need  to  identify  the  cost  associated  with  each 
state  Si  first.  To  do  this,  the  notions  of  goal  state  and  its  marked  set  are 
introduced  as  follows.  In  the  CTMC  model,  concurrent  task  executions  can  be 
viewed  as  the  movement  of  tokens  in  the  corresponding  GSPN.  A  place  in  the 
GSPN  is  said  to  be  marked  if  it  has  a  token  in  it,  implying  the  completion  of 
all  activities  preceding  the  place.  In  other  words,  a  deadline  in  the  task  system 
is  essentially  associated  with  the  marking  of  a  place.  A  place  is  time-critical  if 
a  deadline  is  associated  with  it.  A  state  s,  containing  a  non-empty  set  M,  of 
marked  time-critical  places  is  then  called  a  goal  state  with  the  marked  set  Mi- 
Since  a  task  is  considered  completed  only  after  all  its  activities  are  finished,  a 
time-critical  place  is  located  only  at  the  conclusion  of  the  task  containing  the 
place. 

Let  and  $,  =  —  M,-,  1  <  u  <  /,  denote,  respectively,  the  set  of  all 

those  places  of  Si  that  shordd  be  marked  and  that  have  not  been  marked  at 
the  next  task  invocation  time  .  Since  whether  or  not  a  task  misses  its 
deadline  is  not  known  until  its  next  invocation,  it  is  natural  to  set  the  cost 
function  —  the  expected  number  of  tasks  missing  deadlines  when  the  system 
is  in  Si  at  epoch  v  —  of  the  underlying  MDP  as: 


0 


-  - 1 


Si 

k 


(3) 
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where  (v)  is  the  one-step  transition  probability  prior  to  the  time-driven 
transition  function  and  |#j|  the  number  of  time-critical  placds  of  Sj  that 
should  be,  but  is  not,  marked  (i.e.,  missing  deadlines)  at 

3.4  Functional  Equations  and  Optimal  Policy 

Under  the  assumption  that  each  CN  has  perfect  observation  of  all  5,’s,  one  can 
construct  and  solve,  backward  recursively,  the  following  functional  equations 
for  an  optimal  policy  v*: 

Uv{si)  =  0,  (4) 

U„(s,)  =  s.)  (u)  0  <  V  <  V,  (5) 

3 

where  V  is  the  last  epoch  of  7  and  Uv{si)  the  cost-to-go  [2]  or  the  total  ex¬ 
pected  number  of  tasks  missing  deadlines  from  epoch  v  to  V  under  an  optimal 
scheduling  policy.  Note  that  the  minimum  expected  number  J*  of  tasks  miss¬ 
ing  deadlines  achieved  under  an  optimal  policy  x*  is  computed  as  J*  =  l/o(se>), 
where  is  the  unique  starting  state  of  the  CTMC  model  of  the  task  system. 

Consider  again  the  task  system  of  Fig.  2.  Fig.  3  shows  the  decisions  of  Na 
under  the  optimal  policy  x”,  which  is  derived  by  dividing  the  planning  cycle 
I  [0, 10)  into  V  =  1,000  intervals  each  of  length  h  =  0.01.  For  example, 
as  the  system  is  in  state  S3  =  (2,  4,  6,  20,  22,  27)  €  Si  (Appendix  A)  at 
time  t  €  Ii  =  [0,  5),  the  optimal  decision  for  Na  (Fig.  3(a))  is  to  execute 
Or  i£  t  <  0.89,  and  to  execute  oj  (i.e.,  respond  to  Ti’s  query)  if  t  >  0.89. 
Suppose  the  system  is  in  S3  at  t  <  0.89,  when  executes  default  activity 
Pi  =  <’>141  ^2  executes  07  and  N3  sends  a  message  to  N3.  Assume  further 
that  ai4  is  finished  at  the  next  epoch  t  +  A  and  thus  brings  the  system  to  state 
se  =  (6,  9,  20,  22,  27).  Then,  as  shown  in  Fig.  3(a),  the  optimal  policy  for  Na 
is  still  to  execute  ay.  However,  if  the  message  from  Na  arrives  at  Na  before 
completing  ay  or  074  (i.e.,  bringing  the  system  to  sxi  =  (2,  4,  6,  20,  22,  28)), 
then  Na  should  instead  reply  to  Nz  (i.e.,  execute  Ug)  immediately.  The  optimal 
policy  in  any  other  state  and  at  any  other  epoch  can  be  obtained  similarly  from 
Fig.  3.  Also,  in  S4,  sg,  Sia  and  sig  within  /g,  ug  is  always  executed  before  ay 
(Fig.  3(b)).  This  is  obvious,  since  at  these  states,  the  token  is  in  <f>aat  rather 
than  in  ^34,  meaning  that  Ta  is  stuck  at  <(>aa  waiting  hopelessly  for  a  message 
that  will  never  arrive.  Hence,  it  is  useless  for  Na  to  execute  ay  since  Ta  will 
not  be  completed  in  time  anyway. 

In  Fig.  3,  several  anomalies  on  the  optimal  decisions  occur  near  the  lut 
epoch  of  [0,  5)  and  [5, 10).  For  example,  when  the  system  is  in  se  &t  epoch 
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499,  the  optimal  decision  of  N3  as  shown  in  Fig.  3(a)  is  03  rather  than  aj.  This 
is  because,  from  our  approximation  of  continuous  time  with  disdrete  epochs, 
02  is  a  decision  just  as  good  as  ar  at  the  last  epoch  of  [0,  5).  As  V  gets  larger 
and  larger,  the  derived  x*  will  become  closer  and  closer  to  the  true  optimum. 

The  optimal  cost  for  this  example  is  J*  =  1.3045,  which  is  rather  high.  Fur¬ 
ther  experiments  show  that  this  is  because  of  the  absence  of  default  activities 
for  A/3  and  the  tightness  of  task  deadlines.  For  example,  J*  immediately  drops 
to  as  small  as  0.00056  when  each  task  deadline  is  extended  to  10  times  of  the 
original  deadline.  Also,  from  Eqs.  (4)  and  (5),  the  computational  complexity 
of  the  above  solution  method  is  0(V  |5u|  i/^{),  where  V  is  the  total  number  of 
epochs,  |5u|  the  average  number  of  states  in  a  typical  period  h,  and  \D\  the 
average  number  of  available  scheduling  decisions  in  each  state. 

4  Optimal  Decentralized  TMSP 

Since  the  up-to-date  global  information  is  not  generally  available  to  each  CN, 
it  is  important  to  derive  an  optimal  decentralized  policy  such  that  the  expected 
number  of  tasks  missing  deadlines  is  again  minimized.  Under  this  policy,  each 
CN  makes  its  own  part  of  the  scheduling  decision  using  information  available 
to  the  CN  only.  The  information  available  to  a  CN  consists  of  the  current 
execution  stage  of  tasks  on  itself  (i.e.,  local  state)  and  possibly  out-dated 
information  on  the  other  CNs.  This  out-dated  information  is  the  local  state 
which  is  broadcast  periodically  from  each  of  the  other  CNs. 

Given  a  state  broadcast  frequency,  system  performance  depends  heavily 
on  how  well  each  CN  uses  the  broadcast  information  together  with  its  own 
local  state  to  make  scheduling  decisions.  This  problem  belongs  to  the  class  of 
dynamic  team  decision  problems  [4],  and  is  very  difficult  to  solve  except  those 
with  one-step  delay  sharing  (1-SDS)  information  pattern  [1,  5,  16].  In  our 
decentralized  TMSP,  a  variation  of  1-SDS  problems,  the  following  dynamic 
information  is  available  to  each  CN  at  time  t:  its  own  local  state  at  time  t, 
and  the  local  states  of  all  other  CNs  at  time  t  —  1. 

Ideally,  the  more  frequently  does  each  CN  collect  the  other  CNs’  local  state 
information,  the  better  scheduling  decisions  it  will  make.  However,  more  fre¬ 
quent  state  broadcasts  induce  higher  overheads  to  normal  inter-task  communi¬ 
cations,  and  thus  degrade  the  overall  system  performance.  We  shall  determine 
the  optimal  frequency  of  broadcasts  after  the  optimal  decentralized  policy  for 
each  CN  has  been  derived. 
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In  the  {oUowing  subsections,  we  first  compute  the  delays  both  in  state  broad¬ 
casts  and  in  normal  inter-task  communications  due  to  the  introduction  of  state 
broadcasts.  Then,  as  in  the  centralized  TMSP,  we  identify  the  important  char¬ 
acterizing  elements  of  the  MDP  such  that  the  decentralized  TMSP  can  again 
be  solved  with  the  DP  technique.  Unlike  its  centralized  counterpart  however, 
the  dements  for  the  decentralized  TMSP  indude:  the  probability  state  of  the 
DP  network,  set  of  admissible  action  mdes,  probability  state  and  its  one-step 
transition  probability,  one-step  cost,  and  the  functional  equations. 

4.1  Delays  in  State  Broadcasts  and  Inter-Task  Com¬ 
munications 

Using  the  CTMC  modd  and  the  periodic  state  broadcasts,  the  (communica¬ 
tion)  subsystem  can  be  approximated  by  two  single-server  queues  in  paralld: 
an  M/M/1  queue  for  inter-task  communications  and  a  D/D/I  queue  for  state 
broadcasts.  Message  communication  ddays  thus  depend  on  what  portion  of 
the  subsystem’s  capadty  is  allocated  to  each  of  these  two  queues. 

Let  Ax  and  6x,  respectively,  be  the  known  arrival  rate  and  average  service 
need  (e.g.,  in  number  of  bits)  of  inter-task  messages.  Let  Ay  and  by  be  the  given 
number  of  synchronous  state  broadcasts  per  unit  time  and  the  service  need 
fc:'  each  broadcast,  respectivdy  (to  be  daborated  further  bdow).  Then,  the 
portion  of  the  subsystem’s  capacity  allocated  to  serving  inter-task  messages 
can  be  approximated  by  ^x  =  (Azbx)/(Ax&x+Ay6y)  and  that  for  state  broadcasts 
becomes  ^y  =  1  —  ^x  =  (Ay6y)/(Axfex  +  Ay&y). 

Recall  that,  in  the  CTMC  modd  without  state  broadcasts,  the  delay  of  an 
inter-task  message  j  was  represented  by  an  exponentially  distributed  random 
variable  with  rate  pj.  So,  the  average  sojourn  (system)  time  B  of  messages  in 
the  subsystem  can  be  approximated  by  ^  ^  II;  !/;/>,  where  n  is  the  total 

number  of  inter-task  messages  within  7.  It  follows  from  [6]  that  the  service 
rate  of  the  original  M/M/1  queue  is  <7x  =  A*  + 1/5.  Hence,  the  “adjusted” 
service  rate  of  inter-task  messages  with  broadcast  messages  considered  be¬ 
comes  <t'  =  ixCg  =  ^x(Ax  +  1/5),  and  the  average  sojourn  time  S'  of  the  new 
M/M/1  queue  for  the  inter-task  messages  turns  out 

_  1  1  s 

-  Ax  ^z( Ax  + 1/5)  -  Ax  ^x  -  ^vAx5  •  ^  ^ 

Notice  that  0-4  ^x  —  iy^x^  ^  1;  otherwise,  if  ^x  —  CyAx5  =  0  then  the  total 
traffic  density  of  the  M/M/1  queue  will  saturate  the  subsystem.  Given  5'  as 
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ftbove,  the  "adjusted”  transition  rate  ft'j  of  inter-task  messages  j  tl^en  becomes 


0  <  =  N  <  nj 


(7) 


as  a  result  of  introducing  state  broadcasts. 

Since  the  subsystem’s  capacity  is  in  fact  allocated  indistinguishably  amAtig 
aU  the  messages,  the  average  service  time  of  each  inter-task  message 

(state  broadcast)  at  the  M/M/1  (D/D/1)  queue  must  be  Wg  =  abs/U  = 
where  a  is  a  fixed  constant.  It  follows  that  WgfWy  =  ibgf(x){iyfhy) 
and  thus  the  relation  XgWg  =  \yWy  holds.  Hence,  the  state  broadcast  delay 
(sojourn  time  or  service  time  in  this  case)  Wy  can  be  expressed  [6]  as: 

W  =  =  — —  = 

Ay^. 


KU><r  +  i/sy 

In  the  above  derivation,  while  each  inter-task  message  is  treated  as  a  single 
arrival  at  the  M/M/1  queue,  all  the  synchronous  state  broadcasts  at  a  given 
epoch  are  treated  as  a  single  arrival  at  the  D/D/1  queue.  This  is  not  unrea¬ 
sonable  because  state  information  is  assumed  to  be  simultaneously  broadcast 
at  a  given  epoch  by  all  CNs.  Also,  the  inter-broadcast  interval  1/Ay  must 
always  be  larger  than  Wy  to  prevent  the  broadcast  messages  from  saturating 
the  subsystem.  Therefore,  in  the  following  discussions,  we  assume  that  each 
CN  broadcasts  state  information  only  after  the  previous  broadcast  has  been 
received  by  all  other  CNs.  (This  can  be  accomplished,  for  example,  with  the 
method  in  [3].)  In  terms  of  the  theory  of  stochastic  control,  this  assumption 
allows  the  separation  of  estimation  &om  control  [16],  which  is  essential  for  the 
applicability  of  the  DP  technique. 

In  what  follows,  we  assume  that  each  CN  broadcasts  its  state  for  B  times 
within  /  =  [0,  L)  with  inter-broadcast  interval  g  =  LjB  >  Wy.  That  is,  the 
first  broadcast  is  made  at  t  =  0,  the  last  at  t  =  (B  —  l)g,  and  all  messages 
broadcast  at  t  will  be  received  at  t  -f-  IV'y  <  f  -I-  9. 


4.2  Probability  States  of  the  DP  Network 

Since  the  current  global  state  is  unavailable,  we  need  to  use  other  information 
as  the  “state”  in  our  DP  network.  For  each  epoch  t>  in  /«  =  [tn„,  u?„+i)  define  a 
probability  state  as  the  vector  p  =  (pi,  pj,  •  •  • ,  P|5«|),  where  p,  is  the  marginal 
probability  that  the  system  is  in  s,-  at  epoch  v,  and  the  global  state  space 
in  /„.  Obviously,  there  are  infinite  number  of  probability  states  at  each  epoch 
V.  However,  since  (i)  both  |5u|  and  the  number  of  available  scheduling  options 
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in  each  state  axe  finite  and  (ii)  Wy  time  unit  old  ^obal  state  unavailable  to 
all  CNs  once  every  g  time  units,  only  a  finite  subset  of  probability  states  will 
be  generated.  These  probabdity  states  serve  as  the  set  of  “nodes”  in  the  DP 
network  on  which  the  DP  technique  is  applied  for  an  optimal  solution. 

4.3  Set  of  Admissible  Action  Rules 

An  action  rule  is  a  rule  of  action  fox  a  policy  at  an  epoch,  and  a  policy  is  the 
collection  of  action  rules  at  all  epochs.  The  set  of  action  rules  that  Nk  can  use 
within  lu  is  the  Cartesian  product  of  the  sets  of  scheduling  options  available 
to  Nk  when  Nk  is  in  each  of  its  local  states  within  For  example,  if  the 
total  number  of  local  states  that  N2  can  be  in  within  Ii  is,  say,  3  and  N^  has  2 
scheduling  options  for  each  of  these  3  states,  then  the  total  number  of  action 
rules  that  N2  can  possibly  take  within  /i  is  2^  =  8.  One  of  such  action  rules 
would  be:  “If  N2  is  in  the  first,  second  and  third  local  states,  then  it  executes 
activities  U],  ur  and  0$,  respectively.”  The  entire  set  Fu  of  action  rules  for  all 
CNs  within  lu  is  the  Cartesian  product  of  the  sets  of  action  rules  for  all  m 
CNs. 

Since  some  components  of  a  probability  state  p  in  /„  may  be  zero,  only  a 
subset  of  all  action  rules  are  admissible.  To  determine  this  set  of  admissible 
action  rules  for  a  given  p,  we  first  identify  the  set  of  all  local  states  of  Nk 
within  each  of  which  has  a  non-zero  probability  according  to  p.  Second, 
identify  the  set  of  action  rules  for  each  of  such  local  states  for  Nk.  Third, 
construct  the  set  of  admissible  action  rules  for  each  Nk  at  p.  Finally,  the  set  of 
all  admissible  action  rules  ru(p)  is  then  constructed  as  the  Cartesian  product 
of  the  sets  of  admissible  action  rules  over  all  CNs.  Obviously,  only  ru(p)  needs 
to  be  considered  in  solving  the  TMSP  at  p. 

4.4  Probability  State  and  One-Step  lYansition  Prob¬ 
ability 

For  the  centralized  TMSP,  the  one-step  transition  probability  is,  as  described 
before,  one  essential  element  in  its  DP  formulation.  In  the  decentralized  case 
however,  except  at  the  epochs  where  broadcasts  are  received,  it  is  the  probabil*^ 
ity  state,  rather  than  the  probability  of  jumping  into  a  state,  that  is  essential 
to  the  DP  formulation.  This  is  because  the  system  always  jumps  from  one 
probability  state  to  another  (of  the  next  epoch)  with  probability  1.  Suppose 
the  system  is  in  probability  state  p  at  epoch  v  €  The  probability  states 
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generated  at  epoch  v  + 1  from  p  at  v  and  their  one-step  transition  probabilities 
are  determined  depending  on  whether  or  not  epoch  v  -f- 1  is  a  broadcast  receipt 
epoch. 

A:  t;  -f  1  is  not  a  message  receipt  epoch: 

From  the  definition  of  an  admissible  action  rule  7  €  Tu(p),  each  7  determines 
a  unique  scheduling  decision  for  each  CN  and  each  global  state  s,-  €  Su  with 
Pi  ^  0.  Given  that  the  system  is  in  s,  at  epoch  v,  one  can  use  Eq.  (1)  or  (2) 
to  determine  the  probability  P^j  (u)  that  the  system  will  move  to  Sj  at  epoch 
V  -1- 1  if  7  is  used.  Given  the  marginal  probability  p,  of  a,  at  epoch  v,  one 
can  then  compute  the  marginal  probability  Pj  of  Sj  at  epoch  v  -h  1  as:  p'  = 
j’p,  Pij  (w).  Therefore,  the  unique  probability  state  p'  =  (p^,  pj,  pjs^j) 
generated  at  epoch  o  1  from  p  at  epoch  v  is  determined  by: 

p'  =  pP»,  (9) 

where  is  the  one-step  transition  matrix  whose  elements  are  the  one-step 
transition  probabilities  P^j  (o)’s  determined  in  £q.  (1)  or  (2).  For  notational 
simplicity,  we  write  p'  =  P''(p)  to  denote  that  p'  is  the  unique  probability  state 
from  p  under  7. 

Since  one  p'  is  generated  from  p  for  each  7  €  ru(p),  a  set  of  |ru(p)|  branches 
of  rhps  is  formed  in  the  DP  network  at  epoch  v  -|- 1  from  p.  Looking  one  epoch 
backward,  the  p  itself  is  nothing  but  one  branch  of  another  set  generated  from 
a  certain  probability  state  at  epoch  t;  —  1,  and  so  on  (Fig.  4).  Given  a  generic 
p  at  epoch  v  between  two  consecutive  state  receipt  epochs,  this  relationship 
continues  to  hold  through  a  particular  probability  state,  written  as  X(p),  at  the 
epoch  when  states  were  broadcast,  and  finally  rooted  at  a  unique  probability 
state  at  an  epoch  when  the  last  state  broadcasts  were  received.  It  is  important 
to  point  out  that  each  element  of  X{p)  represents  the  prior  probability  of 
a  global  state  at  the  state  broadcast  epoch,  and  the  particular  global  state 
realized  at  that  epoch  cannot  be  known  until  Wy  time  units  later.  X{p)  plays 
an  important  role  in  determining  the  one-step  transition  probability  as  will  be 
described  in  the  following  case. 

B:  n  -H  is  a  message  receipt  epoch: 

Since  Wy  time  units  old  global  state  information  is  now  available  to  each  CN 
at  epoch  v  -f  1,  the  probability  state  at  epoch  v  -i- 1  is  not  generated  from 
p  at  epoch  v  by  Eq.  (9).  Rather,  &om  the  idea  of  the  1-SDS  information 
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pattern,  a  total  of  |5u|  !?«!*  probability  states  can  be  generated  directly  at 
each  of  such  epochs  in  where  Fu  is  the  set  of  all  action  rules  in  /«  and 
5  =  Wy/h  >  1.  Specifically,  each  of  these  probability  states  corresponds  to: 
(i)  the  actual  global  state  the  system  was  in  Wy  time  units  ago,  and  (ii)  the 
specific  sequence  of  action  rules  adopted  from  the  state  broadcast  epoch  to 
epoch  V  +  1.  In  other  words,  each  of  these  probability  states  can  be  identified 
as  an  (s+ l)*tuple  (s,,  V(0),  7'(1),  •  •  • ,  VCs  —  1)),  where  a,  is  the  actual  global 
state  the  system  was  in  Wy  time  units  ago,  and  7'(e),  e  =  0,  1,  •  •  • ,  a  —  1, 
represents  the  action  rule  taken  at  the  e-th  epoch  since  the  last  state  broadcast. 

Similarly,  one  may  correspond  each  probability  state  p  at  epoch  v,  when  the 
state  broadcasts  are  not  available  yet,  to  an  a-tuple  (A’(p),  7(0),  7(1),  •  •  • ,  7(a— 
2)),  where  X(p)  is  the  root  of  p  at  the  state  broadcast  epoch,  and  7(e)  the  ac¬ 
tion  rule  taken  at  the  e-th  epoch  since  the  state  broadcast.  Notice  that  while 
7(e)’s  repr^ent  the  action  rules  which  “brought”  the  system  to  probability 
state  p  at  epoch  v,  7'(e)’s  are  those  brought  the  system  to  p'  at  epoch  v  -f  1. 
In  what  follows,  the  one-step  transition  probability  from  p  to  p'  is  determined, 
depending  on  whether  or  not  7(e)  =  7'(c),  V  c  =  0,  1,  2,  •  •  • ,  s  —  2. 

Given  p  and  p'  at  epochs  v  and  v  +  1,  respectively,  denote  as  7(5  -  1)  the 
action  rule  adopted  at  epoch  v.  Then,  the  one-step  transition  probability  from 
p  to  p'  can  be  determined  easily  as: 

O-rC-i)  ^  /  9.  if  ')'(c)  =  7'(c),  Vc  =  0,  1,  •  • ,  s  -  1, 

I  0  otherwise, 

where  q,  is  the  i-th  component  of  X{p)  representing  the  marginal  probability 
of  the  system  being  in  s,,  which  is  the  first  component  in  the  (5  -f  l)-tuple 
representation  of  p'. 

4.5  One-Step  Cost 

The  one-step  cost  cr(t;,  p)  at  epoch  v  for  probability  state  p  under  action  rule 
7  can  be  easily  determined  using  the  one-step  cost  c”(v,  si)  obtained  from  Eq. 
(3)  for  the  centralized  TMSP.  Specifically,  we  have 

|S.| 

=  (11) 

»=i 

where  tt  is  the  centralized  policy  corresponding  to  the  decentralized  action  rule 
7  when  the  system  is  in  s,-,  and  pi  the  i-th  component  of  p. 
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4.6  Functional  Equations  and  Optimal  Policy 

Unlike  the  centralized  case  for  which  only  one  pass  is  needed,  three  passes 
are  required  to  derive  the  optimal  decentralized  policy.  The  first  pass  is  to 
generate  all  the  probability  states  of  the  DP  network.  The  second  pass  solves 
the  functional  equations,  backward  recursively,  to  find  the  optimal  action  rule 
at  each  probability  state  p  and  each  epoch  v.  Since  p  is  not  observable,  a 
third  pass  is  needed  to  identify  the  optimal  action  rule  for  each  epoch.  In 
what  follows,  the  last  two  passes  are  described;  the  first  pass  has  already  been 
presented  in  Section  4.4. 

A.  Functional  Equations  (the  second  pass) 

Similarly  to  the  centralized  case,  the  functional  equations  Uv{p)  is  the  cost- 
to-go  representing  the  number  of  task  missing  deadlines  as  the  system  is  in  p 
at  epoch  v.  Uv{p)  can  be  detennined  easily  by  using  backward  recursion  as 
follows. 

Uvip)  =  0.  (12) 

If  t;  +  1  is  not  a  message  receipt  epoch,  then 

u.(p)  =  mv,  p) + (;„«(/)).  (13) 

7€r»(p) 

On  the  other  hand,  if  u  -fl  is  a  message  receipt  epoch  then 

u.ip)  =  |<r'->(»,  p)  +  E  .  (14) 

where  is  the  one-step  transition  probability  from  p  to  p'  derived  in 

£q.  (10).  Notice  that  the  minimum  expected  number,  J*,  of  tasks  missing 
deadlines,  which  is  achieved  under  the  optimal  decentralized  policy  7*,  is  equal 
to  Uo{po),  where  po  is  the  probability  state  representation  of  the  unique  starting 
state  So  of  the  task  system. 

B.  Optimal  Policy  (the  third  pass) 

Let  %(p)  be  the  optimal  action  rule  for  p  at  epoch  v  derived  in  the  second 
pass  above.  Then,  the  optimal  decentralized  action  rule  7*  we  are  interested 
in  can  be  identified  as  the  %{p)  of  one  particular  p  to  be  explained  below. 
Notice  that  the  probability  state  p  at  epoch  v  is  not  observable  (and  thus 
not  appears  as  part  of  the  notation  of  7*)  and  is  uniquely  determined  by 
the  action  rule  used  and,  for  broadcast  receipt  epochs,  the  out-dated  state 
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broadcast  information  as  well.  This  means  that  the  nodes  (probability  states) 
of  the  DP  network  visited  by  the  optimal  action  rule  7*  form  a  path  of  V 
nodes  each  at  a  separate  epoch  from  0  to  V  —  1.  Therefore,  the  optimal  action 
rule  7*  at  epoch  v  can  be  identified  following  the  nodes  p*  to  be  visited  on  the 
path  as  follows.  We  consider  each  of  the  three  intervals:  (1)  from  epoch  0  to 
the  first  message  receipt  epoch  ri,  (2)  from  the  t-th  to  the  (t  +  l)-th  message 
receipt  epoch,  1  <  t  <  B  —  1,  and  (3)  from  the  jB-th  (last)  message  receipt 
epoch  rff  to  the  end  of  the  planning  cycle.  Consider  first  the  interval  between 
epoch  0  and  ri.  Obviously,  pg  can  be  initialized  as 


Po  ~  p0‘ 

(15) 

Then,  forward  recursively, 

,  7«  =  %{pI),  0  <  u  <  ri , 

(16) 

and 

Bln  =  P'-iK),  0<v<r,, 

(17) 

where  P^(p)  denote  the  unique  probability  state  from  p  under  7  as  described 
in  Eq.  (9). 

Next,  consider  the  interval  from  r,-  to  r.+i,  1  <  t  <  5  —  1.  Pro-n  the 
results  of  the  second  pass  and  using  the  global  state  contained  in  the  received 
broadcasts,  a  unique  probability  state  Pn  at  epoch  r,-  is  determined.  Similar 
to  Eqs.  (15)-(17),  we  have 


Pr,-  ~  P»t> 

(18) 

it  =  %{pt)i  ri<v<  r.+i , 

(19) 

Pv+i  =  P'^'CpC),  ri<v<  Tf+i, 

(20) 

At  epoch  r,+i,  when  state  broadcasts  are  received  again  and  used,  a  unique 
pr  bability  state  can  be  identified  by  using  the  results  of  the  second  pass 
as  well  as  the  information  contained  in  the  broadcasts  received.  This  process 
repeats  itself  for  all  such  intervals  between  epochs  rf  amd  r,+i  until  the  final 
message  receipt  epoch  rg. 

Finally,  within  the  interval  from  epochs  to  the  last  epoch  V,  one  can 
derive: 

P^B  = 

'll  =  %(el),  ’’B  <  i>  <  V, 
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(21) 

(22) 


and 

/»:+!  =  P"-(p:).  rB<v<V.  (23) 

Obviously,  7*  =  (70,  7^,  •  •  • ,  7iJ,  •  •  • ,  7v_i]  is  the  optimal  policy  derived  for 
the  decentralized  TMSP  with  periodic  state  broadcasts. 

In  summary,  three  passes  are  needed  to  solve  the  decentralized  TMSP: 

Pi.  Generate  the  DP  network  (forward  recursively)  as  described  in  Sections 
4.2-4.4. 

P2.  Solve  the  functional  equations,  Eqs.  (12)*(14),  backward  recursively  using 
the  one-step  cost  derived  in  Eq.  (11)  for  the  DP  network  generated. 

P3.  Identify  the  optimal  decentralized  scheduling  rules  forward  recursively 
from  the  solutions  obtained  in  P2  using  Eqs.  (15)-(23). 

As  an  example,  consider  again  the  task  system  in  Fig.  2.  Within  I  =: 
[0,  10),  suppose  inter-task  messages  Oi,  03,  04,  03,  012  and  013  each  occurs  only 
once,  while  oio  and  an  each  occurs  twice,  resulting  in  a  total  of  10  messages. 
Recall  that  fti  s=  =:  =  fit,  —  2  and  fiio  ~  fin  —  H12  —  —  !•  The 

average  sojourn  time  5  of  these  inter-task  messages  within  the  communication 
subsystem  is  5  =  ^  Hi  ^  *  4/5-  Since  A*  =  10/  L  =  1,  the  service 
rate  of  the  M/M/1  queue  /i*  =  A*  +  1/S  =  9/4.  Let  the  arrival  rate 
of  state  broadcasts  Ay  =  2  and  the  service  need  by  =  0.2  6^,  where  bg  is 
the  service  need  for  each  inter-task  message.  Then,  the  portion  of  capacity 
allocated  to  the  inter-task  messages  is  =  K/{bg  +  0.4frx)  =  S/7  and 
the  adjusted  service  rate  fi’j.  =  ^gfig  =  (5/7)(9/4)  =  45/28.  From  Ek}.  (6), 
B'  =  l/(/i;  -  Ax)  =  28/17  >  4/5  =  B,  and  B/B'  =  17/35.  From  Eq.(8), 
the  delay  in  broadcasting  states  becomes  Wy  =  AxWx/Ay  =  (l/2)(28/45)  = 
14/45  since  Wg  =  1/n'g  =  28/45.  Notice  that  the  inter-broadcast  interval 
(0.5)  is  larger  than  Wy  satisfying  the  requirement  that  states  are  broadcast 
only  after  the  previously  broadcast  states  have  been  recaved.  Also,  to  avoid 
saturating  the  communication  subsystem,  the  maximum  of  Ay  with  by  =  0.2  bg 
occurs  only  when  B'  —  00,  i.e.,  =  ^fig  =  Ax  or  ^x  =  (Eqs.  (6) 

and  (7)).  This  occurs  at  Ay  =  25/4  when  ^x  =  4/9,  where  Wg  =  Wy  =  00. 

To  ease  the  computational  difficulty  imposed  by  the  DP  algorithm,  the  fol¬ 
lowing  approximations  are  made:  (i)  the  planning  cycle  is  discretized  into  200 
intervals,  (ii)  Wy  is  discretized  into  only  two  stages,  each  of  which  contains 
several  intervals,  and  (iii)  the  marginal  probability  of  each  state  is  discretized 
into  10  different  intervals.  Applying  this  approximation  to  the  decentralized 
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TMSP  with  by  =  0.2  6*  and  A„  =  0.4,  0.8,  1.0,  2.0,  3.0  and  4.0,  we  obtained 
J*  =  2.309,  1.577,  1.445,  1.446,  1.489  and  1.543,  respectively,  showing  that 
Ay  =  1  is  the  best  among  the  five  broadcast  frequencies.  To  show  the  fact 
that  the  optimal  frequency  depends  on  the  same  algorithm  is  applied  again 
to  the  cases  with  by  =  0.5  6,  and  Ay  =  0.2,  0.4,  0.8,  1.0  and  2.0.  The  best 
broadcast  frequency  again  turned  out  to  be  Ay  =  1.0,  but  vdth  the  corre¬ 
sponding  J*  =  1.544  >  1.445.  However,  as  shown  in  Fig.  5,  the  true  optimal 
broadcast  frequency  of  the  former  should  be  greater  than  that  of  the  latter 
case.  These  results  are  not  surprising  since  the  communication  subsystem 
now  needs  to  allocate  more  capacity  to  deliver  the  same  state  information, 
and  thus,  degrades  the  normal  inter-task  communications. 

4.7  Computational  Cost  of  the  Solution  Algorithm 

Similar  to  smy  DP-based  algorithm,  the  computational  complexity  of  the  pro¬ 
posed  solution  algorithm  comes  mainly  from  the  total  number  of  probability 
states  in  the  underlying  DP  network.  Let  |P|  be  the  number  of  intervals  into 
which  the  probability  spectrum  [0,  1]  is  discretized  and  remember  that  |5„| 
is  the  average  number  of  global  system  states  in  each  interval  1^-  Then,  the 
total  number  of  probability  states  generated  will  be  A  =  where  V 

is  the  number  of  epochs  in  a  planning  cycle.  Remember  that  the  average 
number  of  action  rules  available  to  each  system  state  is  |D|,  the  computa¬ 
tional  complexities  of  the  first  (generating  the  DP  network)  and  the  second 
(solving  the  functional  equations)  passes  are  both  0(A|D|)  =  0(K|P|*^“*|D|), 
whereas  that  of  the  third  pass  (searching  for  the  optimal  policy)  is  0(K).  This 
makes  the  computational  complexity  of  the  overall  algorithm  0(K|P|l^’‘*|/?|). 
It  is  not  surprising  that  because  of  the  difficulty  of  the  decentralized  TMSP, 
the  resulting  solution  is  not  a  polynomial  algorithm.  (In  fact,  all  DP-based 
algorithms  have  exponential  complexity.)  Notice  that  the  computational  com¬ 
plexity  of  our  solution  to  the  centralized  TMSP  is  0(V'|5u||D|)  as  pointed  out 
before,  which  is  much  simpler  than  that  of  its  decentralized  counterpart. 

5  Conclusions 

Scheduling  combined  tasks  and  messages  is  important  in  distributed  real-time 
systems  since  time-critical  tasks  must  be  completed  before  their  deadlines  to 
prevent  possibly  catastrophic  consequences.  In  this  paper,  we  have  presented 
both  centralized  and  decentralized  algorithms  for  the  problem  of  optimally 
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scheduling  periodic  tasks  and  their  inter-task  communication  messages  to  min¬ 
imize  the  number  of  tasks  missing  deadlines. 

Concurrent  execution  of  tasks  and  inter-task  messages  communications  are 
first  modeled  as  a  sequence  of  continuous-time  Markov  based  on  which 

the  DP  technique  is  applied  to  derive  optimal  scheduling  policies.  For  the 
centralized  case,  the  optimal  policy  is  computed  by  mummiTig  the  up-to-date 
global  system  state  is  available  to  each  computing  node  (GN).  For  the  decen¬ 
tralized  case,  however,  we  assume  that  all  CNs  periodically  broadcast  their 
local  states  so  that  other  CNs  can  make  better  scheduling  decisions.  The  opti¬ 
mal  decentralized  scheduling  policy  and  its  optimal  state  broadcast  frequency 
are  derived  by  using  the  separation  principle,  i.e.,  separating  state-estimation 
from  decision-making. 

Both  optimal  centralized  and  decentralized  policies  are  derived  off-line,  and 
the  results  can  be  looked  up  when  the  system  is  in  operation.  Thus,  the 
computation  complexities  with  deriving  such  policies  can  be  tolerated  if  the 
problem  size  is  not  too  large.  However,  if  the  number  of  global  states  and  the 
size  of  the  policy  space  are  large,  then  a  simple  but  good  approximation  to 
this  technique  is  important.  This  is  a  matter  of  our  future  inquiry. 
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Receiving  Side  Sending  Side  ▼ 

(a).  SEND-RECEIVE-REPLY 


Responding  Side  Querying  Side 

(b).  QUERY-RESPONSE 


Figure  1:  GSPN  Model  of  Default  Activity 
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Figures:  Optimal  Scheduling  Policy  for  N2 
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APPENDIX  A 


SYSTEM  STATES  OF  HGURE  2 

A  stale  Si  is  repiesenied  as  i  »  (7  |  place  has  a  tdcea  ). 


(a).  Ii  =  (0,  6) 


3  =  2  4  6  20  22  27 

5  =  2  4  5  20  22  28 

6  =  6  9  20  22  27 

8  =  5  9  20  22  28 

11  =  2  4  6  20  22  28 

17  ~  6  9  20  22  28 

23  s  2  4  7  20  22  28 

24  ~  2  4  6  21  22  28 

25  s:  2  4  6  20  22  29 

26  =:  2  4  6  20  24  29 

32  =  7  9  20  22  28 

33  =  6  9  21  22  28 

34  =  6  9  20  22  29 

35  =  6  9  20  24  29 

41  =  2  9  20  22  28 

47  =  2  4  6  20  24  31 

50  =  8  9  20  22  28 

56  =  6  9  20  24  31 

59  =  3  9  20  22  28 


(b).  la  =  [5,  10) 


4  =  11  13  15  20  22  32 

9  =  15  18  20  22  32 

12  =  11  13  15  20  22  33 

18  =  15  18  20  22  33 

40  =  11  13  15  20  24  32 

41  =:  11  13  14  20  24  33 

45  «  15  18  20  24  32 

46  a  14  18  20  24  33 

48  s  11  13  15  20  24  33 

53  a:  11  13  15  21  24  33 

56  15  18  20  24  33 

59  =:  11  13  16  20  24  33 

60  s  11  13  15  20  25  34 

63  =:  15  18  21  24  33 

70  =  16  18  20  24  33 

71  =  15  18  20  25  34 

74  =  11  18  20  24  33 

76  s  11  13  15  20  25  35 

85  s  17  18  20  24  33 

87  =  15  18  20  25  35 

88  =  12  18  20  24  33 


Comparing  Formal  Approaches  for  Speciiying  and  Verifying 

Real-Time  Systems 

C.  L.  Heitmeyer,  R.  D.  Jeffords,  and  B.  G.  Labaw* 


1  Introduction 

En^eering  embedded  eystems,  such  as  the  mission-critical  computer  (MCC)  systems  developed 
for  the  Navy,  has  become  difficult  due  to  the  complexities  of  stringent  requirements  for  hard-real- 
time  performance,  dependability,  security,  etc.  Of  particular  importance  is  the  development  of 
unambiguous,  complete,  and  consistent  requirements  specifications  for  such  systems.  Experience 
has  shown  that  system  errors  found  late  in  the  development  process  often  are  the  result  of  poorly 
understood  requirements  specifications  [19].  F^irthermore,  such  system  errors  are  best  detected  as 
early  as  possible  to  avoid  much  more  expensive  modifications  later  in  development  [3]. 

One  approach  to  this  problem  is  the  use  of  formal  methods  in  specifying  requirements  and  in  ver¬ 
ifying  critical  properties  of  those  requirements.  Formal  methods,  based  upon  precise  mathematical 
theories  and  models,  have  the  potential  for  improving  the  correctness  of  requirements  specifications, 
especially  those  for  the  most  critical  aspects  of  the  system  such  as  security  and  real-time  perfor¬ 
mance.  It  has  been  suggested  (see,  e.g.,  [20])  that  these  formal  methods  need  to  be  tested  on  actual 
real-time  systems.  Such  testing  will  allow  the  scalability  of  the  methods  to  be  assessed  and  will  also 
uncover  new  problems  requiring  formal  solution.  A  recent  survey  reports  that  formal  methods  have 
been  used  successfully  in  the  development  of  actual  systems,  but  that  much  needs  to  be  done  in 
such  areas  as  integrating  formal  methods  with  informal  methods  in  the  development  process  and  in 
introducing  better  formal  models  for  bard  real-time  requirements  [10]. 

The  goal  of  our  research  is  to  understand  better  the  bard  real-time  aspects  of  requirements.  Re¬ 
cently,  a  large  number  of  formal  methods  have  been  invented  for  specifying  and  verifying  real-time 
systems.  However,  a  greater  understanding  is  needed  of  how  they  compare— e.g.,  what  classes  of 
problems  they  can  solve,  the  availability  and  quality  of  mechanical  support,  etc.  To  provide  insight 
into  the  utility  of  different  methods  for  solving  real-time  problems,  we  have  developed  a  generic  ver¬ 
sion  of  a  real-time  rtulroad  crossing  system.  We  are  using  this  example  as  a  benchmark  for  comparing 
different  formal  approaches  for  specifying  real-time  systems  and  for  analysing  their  properties.  In 
this  paper,  we  provide  a  formal  statement  of  the  problem;  describe  three  formal  approaches  that 
can  be  applied,  namely,  process  algebra,  model  chewing,  and  general-purpose  theorem  provers;  and 
summarize  efforts  currently  in  progress  to  use  each  approach  to  specify  the  system  of  interest  and 
prove  properties  about  its  behavior.  In  Sections  4  and  5,  we  describe  initial  results  obtained  with 
an  approach  based  on  the  process  algebra  CSP  and  an  approach  using  the  general-purpose  theorem 
proving  system  PVS,  respectively. 

2  Generic  Railroad  Crossing  (GRC)  Problem 

2.1  Background 

The  original  example,  developed  by  Leveson  to  illustrate  her  software  safety  techniques  [18]  and 
extended  to  a  real-time  version  to  illustrate  Modechart  [16],  involves  a  system  operating  a  gate 

•C.  Heitmeyer,  R.  Jeflbrd*,  and  B.  Labaw  are  with  the  Naval  Reaeardi  Laboratory,  Washington,  DC  20375.  The 
work  reported  here  ie  supported  in  part  by  ONT  grant  N6092103WRW0066. 
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2  Generic  Railroad  Crossing  (GRC)  Problem 

2.1  Background 

The  original  example,  developed  by  Leveaon  to  illustrate  her  software  safety  techniques  [18]  and 
extended  to  a  real*tune  version  to  illustrate  Modechart  [16],  involves  a  system  operating  a  gate 
at  a  railroad  crossing.  The  system  must  ensure  that  the  system  cannot  enter  an  rmsafe  state.  In 
particular,  it  must  satisfy  a  sa/efy  property:  i.e.,  whenever  a  train  is  in  the  crossing,  the  crossing 
gate  must  be  down. 

To  make  the  problem  somewhat  more  realistic,  we  have  generalized  it.  While  the  previous 
versions  describe  a  system  with  a  single  track  and  at  most  two  trains  in  the  region  of  interest  both 
traveling  in  the  same  direction,  our  version  allows  several  tracks  and  an  unspecified  number  trains 
traveling  in  both  directions.  In  addition  to  the  safety  property,  our  version  includes  a  sti/ity  property 
to  ensure  that  automobiles  have  fair  access  to  the  crossing.  The  purpose  of  the  utility  property  is  to 
make  sure  the  ^stem  perforins  its  function  and  to  avoid  degenerate  solutions,  e.g.,  a  solution  that 
lowers  the  gate  and  keeps  it  lowered.  Safety-critical  systems  must  not  only  operate  safely.  To  be 
useful,  they  must  also  perform  certain  functions  within  specified  time  intervals.  We  note  that  the 
utility  property  requires  bounded  liveness,  which  turns  out  to  be  a  safety  property. 

2.2  GRC  Problem  Statement 

The  system  to  be  developed  operates  a  gate  at  a  railroad  crossing.  The  railroad  crossing  7  lies  in 
a  region  of  interest  72,  i.e.,  7  C  72.  A  set  of  trains  travel  through  72  on  multiple  tracks  in  both 
directions.  A  sensor  system  determines  when  each  train  enters  and  exits  region  R.  To  describe  the 
system  formally,  we  define  a  gate  function  y(<)  €  [0,90],  where  y(f)  =  0*  means  the  gate  is  down 
and  y(t)  s  90*  means  the  gate  is  up.  We  also  define  a  set  {A|}  of  occvpsncy  intervals,  where  each 
occupancy  interval  is  a  time  intervd  during  which  one  or  more  trains  are  in  7.  The  ith  occupancy 
interval  is  represented  as  A,-  =  [n,  i/J,  where  t<  is  the  time  of  the  fth  entry  of  a  train  into  the  crossing 
when  no  other  train  is  in  the  crossing  and  t/,-  is  the  first  time  since  r,-  that  no  train  is  in  the  crossing. 
Figure  1  shows  two  examples  of  occupancy  intervals. 

Given  two  constants  and  >  0,  (a  >  0,  the  problem  is  to  develop  a  system  to  operate  the 
crossing  gate  that  satisfies  the  following  two  properties: 

Safety  Property:  t  €  U<A<  =>  y(f)  =  0  The  gate  is  down 

in  all  occupancy 
intervals 

Utility  Property:  t  ^  U,[r,  -  fx, »/,  (j]  =>  y(t)  =  90  The  gate  is  up  as 

often  as  possible 

Figures  2a  and  2b  illustrate  the  Safety  and  Utility  Properties. 

3  Formal  Approaches 

Several  formalisms  are  available  to  specify  the  system  described  above  and  to  reason  about  its 
properties.  These  formalisms  fall  into  three  classes: 

•  General-Purpose  Theorem  Proven  (e.g.,  Boyer-Moore  [4],  EVES  [9]  [17],  EHDM  [23], 
PVS  [21]  [22]  [26]  [27],  HOL  [13]), 

•  Model  Chechen  (e.g.,  Clarke’s  CTL  [5],  the  Modechart  verifier  [28]),  and 

•  Process  Algebras  (e.g.,  CSR  [12],  Geaveland’s  Concurrency  Workbench  [7],  and  CSP  [11]). 

We  note  that  verification  tools  based  on  the  two  latter  approaches,  model  checking  and  process 
algebras,  are  highly  specialized  and  provide  verification  with  little  human  intervention.  In  contrast. 
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a  proof  generated  with  a  mechanical  theorem  prover  usually  requires  considerable  human  guidance. 
Efforts  are  currently  in  progress  to  apply  one  or  more  examples  of  each  approach  to  the  GRC 
problem.  To  develop  insight  into  the  styles  of  specification  and  verification  that  are  most  natural 
and  effective  for  a  given  approach,  these  efforts  are,  to  the  extent  feasible,  proceeding  independently. 

Our  initial  evaluation  focused  on  the  FDR  (Failures  Divergence  Refinement)  tool  [11]  for  automat¬ 
ically  checking  CSP  specifications.  Bill  Roscoe  of  Oxford  developed  the  original  CSP  specifications 
of  the  GRC  problem  which  NRL  has  modified  and  extended.  The  analysis  in  Section  4  is  based  on 
the  FDR  version  developed  by  NRL. 

A  second  evaluation  was  based  on  a  solution  developed  using  the  theorem-proving  system  PVS  by 
Natarajan  Shankar  of  SRI  Internationrd  [25].  That  solution  used  a  real-time  model  bas^  upon  Unity 
[€]  with  real-time  constraints  expressed  using  a  past  time  operator  Since.  At  NRL  we  experimented 
with  the  same  basic  model,  but  used  a  complementary  Till  operator  that  expressed  future  time.  The 
analysis  in  Section  5  covers  both  of  these  efforts. 

We  evaluated  the  suitability  of  the  CSP  and  PVS  solutions  using  criteria  defined  in  reference 
[8].  These  criteria  include  conciseness,  expressibility,  ease  of  use,  and  scalability.  In  evaluating 
expressibility,  we  compared  the  ease  of  expressing  both  the  system  specifications  and  the  properties 
of  interest. 


4  CSP  Solution 

4.1  Overview 

Verification  in  FDR  means  checking  that  one  CSP  process  refines  another  CSP  process.  In  the  GRC 
example,  it  must  be  shown  that  the  CSP  process  that  models  the  system  behavior  (which  we  treat 
as  the  specification)  refines  a  more  abstract  CSP  process  that  encodes  a  system  property,  such  as 
the  Safety  Property  or  the  Utility  Property. 

The  current  version  of  FDR  does  not  have  an  intrinsic  model  of  time.  To  model  time,  we 
interleave  clock-pulse  events  among  the  other  system  events.  For  this  model  of  time,  it  is  essential 
to  verify  the  following  two  properties  before  addressing  the  Safety  and  Utility  Properties: 

1.  No-Deadlocks  Property.  The  specification  is  deadlock-free. 

2.  Non-Zeno  Property.  The  specification  does  not  exhibit  Zeno  behavior;  between  any  two 
clock-pulses,  there  is  never  an  infinite  number  of  other  events. 

In  our  verification,  we  defined  the  conjunction  of  properties  1  and  2  as  a  single  property,  called  the 
TimeOK  Property. 

4.2  Evaluation 

Conciseness 

FDR  supports  only  restricted  parameterization  of  CSP  processes  and  numeric  expressions.  Param¬ 
eterization  of  high  level  processes  formed  by  parallel  composition  of  other  processes  is  not  allowed. 
Similarly  numeric  functions  with  arguments  are  not  allowed  in  parsuneterized  expressions. 

In  our  experiments  we  found  both  these  limitations  too  restrictive,  and  used  the  UNIX  m4  macro 
tool  as  a  preprocessor  to  generate  CSP  specifications.  Full  parameterization  of  the  processes  and 
identifiers  would  have  been  more  convenient  if  included  directly  in  FDR. 

Expressibility  for  the  System  Specification 

eSP/FDR  is  particularly  effective  for  modeling  the  control  flow  in  concurrent  finite  state  systems 
that  communicate  via  mutual  synchronization.  It  is  also  convenient  for  specifying  nondeterminism. 
However,  the  following  limitations  of  (untimed)  CSP  and  the  FDR  tool  hamper  expressibility: 
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•  Beouue  timing  is  an  add-on  feature,  modeling  delays  and  deadlines  is  awkward. 

«  The  omission  of  strong  typing  in  CSP/FDR  means  that  some  common  errors,  such  as  undefined 
constants,  go  undetected.  '' 

•  No  graphical  representations  of  CSP  processes  are  available.  Some  users  ma^  prefer  graphical 
Kpresentation  to  aid  in  comprehension  of  CSP  processes. 

An  intrinnc  weakness  of  this  approach,  shared  with  many  model  checkers,  is  that  a  concrete  finite 
model  of  the  specifications  is  required.  (This  can  also  be  viewed  as  an  advantage,  since  concrete 
models  permit  decision  procedures  that  lead  to  completely  automatic  verification.)  This  manifests 
itself  in  a  number  of  ways: 

•  Modeling  data  retained  in  the  state  of  a  process  is  weak — ^it  is  addressed  only  to  a  limited 
extent  by  channels  and  parameterized  processes. 

•  Modeling  is  limited  to  finite  state  systems. 

•  General  specifications  of  relationships  between  arbitrary  variables  cannot  be  nandled. 

Ease  of  Use 

The  basic  use  of  FDR  to  check  traces  refinement  is  not  difficult  to  learn.  Traces  refinement,  a  familiar 
approach  used  in  other  systems,  e.g.,  the  Constrained  Expression  Toolset  [2],  is  closely  related  to 
tegular  language  recognition.  The  verification  process  is  straightforward  and  completely  automatic. 

That  the  FOR  tool  is  a  prototype  is  evident.  The  tool  operates  on  textual  specifications  in  a 
largely  batch-oriented  style  (e.g.,  a  large  chunk  of  specification  is  parsed  at  one  time;  error  reporting 
is  minimal  and  often  cryptic).  Moreover,  there  is  little  tool  support  for  creating  specifications. 
Although  an  excellent  user  manual  and  tutorial  [11]  are  available,  detailed  instructions  for  using  the 
tool  are  generally  lacking. 

Scalability 

Our  more  ambitious  experiments  with  FDR  quickly  led  to  unacceptable  response  times.  The  major 
underlying  cause  was  exponential  increase  in  the  size  of  the  CSP  processes  as  system  parameters 
grew  larger.  The  basic  limitation  to  concrete  models  and  the  associated  exhaustive  search  strategy 
inherent  in  refinement  checking  is  the  essential  cause  of  this  exponential  increase. 

The  compositionality  of  CSP  refinement  provides  an  approach  to  scalability  problems.  If  the 
most  abstract  version  of  a  system  refines  (satisfies)  a  property,  and  its  components  are  refined  (in 
the  dual  sense  that  we  refine  them  to  a  more  concrete  form  and  the  refinement  relation  holds) 
independently,  then  the  most  concrete  version  also  satisfies  the  property. 

This  form  of  composition  is  straightforward  if  we  are  working  from  an  abstract  form  to  increasing 
levels  of  detail.  Our  case  was  more  difficult  since  we  were  working  backward  and  wished  to  find  a 
more  abstract  form.  The  natural  candidate  for  the  more  abstract  form  in  GRC  was  a  specification 
that  collapsed  the  multitude  of  trains  into  a  single  Abstract  TViun.  With  this  approach  we  were 
able  to  reduce  the  time  to  verify  the  TimeOK  property  from  18  minutes  elapsed  time  to  2  minutes 
elapsed  time. 

Expressibillty  and  Validation  of  Properties 

Validating  that  the  formal  expression  of  the  properties  in  FDR  is  equivalent  to  those  of  the  GRC 
problem  statement  is  difficult,  even  though  both  forms  are  formal.  The  major  difficulty  is  that 
the  language  for  expressing  the  properties  that  FDR  requires  is  the  same  CSP  language  used  to 
express  the  specifications  rather  than  a  more  abstract,  declarative  language  such  as  the  CSP  traces 
language.  Use  of  this  language  makes  it  awkward  to  express  properties  that  refer  to  state:  e.g.,  the 
Safety  property  can  be  paraphrased  as  “In-State-Crossing  implies  In-State-Down .” 
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The  ease  of  validation  was  improved  by  developing  two  separate  independent  versions  of  both 
the  Safety  Property  and  the  Utility  Property.  Our  attempt  to  prove  the  equivalence  of  the  different 
versions  led  us  to  discover  some  errors.  The  final  result  was  two  sets  of  CSP  sp^fications  of  these 
two  properties.  By  way  of  example,  two  versions  of  the  Utility  Property  were  formulated,  one 
deterministic,  the  other  nondeterministic  and  eaisier  to  understand  due  to  separation  of  concerns. 

5  PVS  Solution 

5.1  Overview 

Specification  and  verification  using  a  general-purpose  theorem  prover  such  as  PVS  requires  both 
the  encoding  of  a  real-time  specification/verification  method  (RSVM)  as  well  as  the  specification 
and  verification  of  the  system  in  question.  Optionally,  one  also  has  complete  freedom  to  modify  or 
create  anew  some  RSVM.  Both  the  RSVM  and  system  specification  must  be  encoded  as  axioms, 
definitions,  proof  strate^es,  etc.  in  the  logic  language  supported  by  the  theorem  prover. 

The  RSVM  developed  by  Shankar  and  encoded  in  PVS  [25]  consists  of  a  state-transition  model 
similar  to  Unity  [6]  with  a  non-decreasing  real-valued  time  associated  with  each  new  system  state. 
Timing  constraints  are  added  via  a  Since  operator,  which  provides  the  time  since  a  condition  defined 
on  the  state  variables  was  last  true  with  respect  to  the  current  system  state.  This  single  operator  is 
sufficient  to  express  both  deadline  and  delay  constraints  upon  a  system.  This  method  also  separates 
the  concurrent  behavior  of  the  system  from  the  timing  behavior. 

A  complementary  Till  operator,  which  indicates  the  nearest  future  state  when  a  condition  be¬ 
comes  true,  was  developed  in  our  experiments  at  NRL.  The  Till  operator  was  added  to  express 
timing  constraints  in  a  more  natural  way  than  could  be  expressed  by  exclusive  use  of  the  Since 
operator. 

Conciseness 

The  language  of  PVS  is  Higher-Order  Logic  (HOL).  In  general,  HOL  provides  a  concise  notation  for 
expressing  system  specifications  as  well  as  the  encoding  of  the  RSVM.  In  many  situations  HOL  allows 
more  concise  specification  of  constructs  than  First  Order  Logic  (FOL)  since  it  allows  parameters  to 
be  functions-this  is  not  allowed  in  FOL. 

Expressibillty  for  the  System  Specification 

On  the  other  hand  the  use  of  HOL  as  a  general-purpose  specification  language  may  not  adequately 
address  concerns  about  domain-specific  special  notations  to  include  graphics.  The  notations  and 
concepts  of  HOL  such  as  quantifiers,  lambda  expressions,  etc.  are  not  likely  part  of  the  vocabulary 
of  the  practicing  engineer.  It  would  be  useful  to  have  a  front-end  tool,  fully  integrable  with  PVS, 
for  experimenting  with  such  domain-specific  notations.  PVS  does  anticipate  part  of  this  need  by 
providing  LaTeX  output,  but  this  is  limited  to  formatting  standard  mathematical  notations. 

Appropriate  encoding  of  the  RSVM  into  a  general-purpose  theorem  prover  can  support  the  de¬ 
velopment  of  quite  general  models.  The  model  of  the  GRC  in  PVS  allows  an  arbitrary  (even  infinite) 
number  of  trains.  Furthermore  it  supports  timing  parameters  as  variables  and  relationships  among 
them  rather  than  specific  values;  e.g.,  to  ensure  safety  of  the  system,  the  sum  of  deadlines  on  moving 
the  gate  down  and  any  signaling  or  computation  overhead  must  be  less  than  the  minimal  time  it 
requires  some  trun  to  reach  the  crossing  after  entering  the  region  R.  General  specifications  pro¬ 
mote  ease  of  change,  reuse,  and  better  understanding  of  boundary  conditions  (e.g.,  the  relationship 
between  gate  and  train  crossing  time  mentioned  previously). 

In  the  development  of  an  RSVM,  particular  care  must  be  taken  that  there  is  a  balance  between 
expressibility  concerns  and  others  such  as  verification  ease.  In  the  case  of  FDR,  the  ease  of  verification 
via  an  automatic  decision  procedure  was  the  overriding  decision  in  limitmg  CSP  processes  to  finite 
concrete  models.  For  the  RSVM  developed  by  Shankar,  there  may  have  been  too  much  emphasis 
upon  ease  of  verification  via  the  exclusive  use  of  the  Since  operator  for  timing  constraints.  The  timing 
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specifications  using  the  Since  operator  are  less  understandable  than  the  delay /deadline  paradigm  of 
Modecbart  [16]  since  the  former  looks  backward  in  time  while  the  latter  looks  forward. 

To  investigate  if  the  delay/deadline  paradigm  could  naturally  be  encoded  in  a  PVS  specification 
(as  well  as  to  gain  experience  with  PVS  by  actual  proving  new  theorems  rather  than  simply  examining 
proofs)  we  added  the  Till  operator  to  the  RSVM.  We  showed  that  it  is  feasible  to  use  the  Till  operator 
to  make  specification  of  timing  more  natural  without  excessive  complication  in  verification  for  the 
Safety  property. 

Ease  of  Use 

The  current  user  interface  to  PVS  is  provided  by  emacs.  This  type  of  interface,  although  better 
than  a  simple  "dumb  terminal”  interface  as  in  the  case  of  FDR,  seems  somewhat  dated  in  this  age 
of  bit-mapped  graphics.  The  user  documentation  assumes  familiarity  with  emacs.  For  the  emacs 
novice  it  would  be  preferable  to  include  the  minimal  subset  of  emacs  required  for  running  PVS  as 
part  of  the  documentation  (the  standard  emacs  online  tutorial  is  insufficient). 

More  importantly,  the  use  of  a  general-purpose  theorem  prover  requires  both  general  theorem 
proving  skills  (i.e.,  the  mathematical  maturity  and  expertise  to  develop  manual  informal  proofs)  as 
well  as  expertise  in  the  use  of  the  tool  since  there  is  considerable  interaction  between  PVS  and  the 
user.  Furthermore  the  level  of  effort  required  to  formally  prove  a  theorem  with  assistance  from  PVS 
is  at  least  an  order  of  magnitude  greater  than  that  required  to  do  an  informal  proof.  For  critical 
applications,  this  effort  may  be  worthwhile.  Errors  may  be  detected  that  might  not  otherwise  be 
found  in  the  "social  process”  of  peer  review  of  informal  manual  proofs.  Moreover,  forma]  proofs  may 
lead  to  better  understanding  of  all  assumptions  required  for  a  complete  proof  [24]. 

Thus  general-purpose  theorem  provers  such  as  PVS  appear  to  be  more  appropriate  for  experi¬ 
mental  use  in  developing  verification  paradigms  (such  as  RSVM’s)  in  special  domains  rather  than 
for  production  use  by  a  practicing  engineer. 

That  PVS  implements  a  strongly-typed  version  of  HOL  is  quite  useful  even  if  full  proofs  are  not 
attempted.  The  preliminary  type-chedcing  phase  of  PVS  can  eliminate  common  specification  errors 
related  to  type  incompatibility. 

Scalability 

In  many  ways,  general  models  scale  up  better  than  concrete  models.  For  example,  repetitive  similar 
constructs  (such  as  the  individual  trains  of  the  GRC)  are  not  much  more  difficult  to  specify  or  verify 
than  a  single  construct  in  PVS.  By  comparison,  extending  the  number  of  trains  in  the  concrete 
model  of  FDR  results  in  exponential  explosion. 

On  the  other  hand,  the  difliculty  of  informal  theorem  proving  in  general,  as  exacerbated  by  the 
level  of  detsul  required  for  formal,  machine-assisted  proof,  makes  scaling  up  very  difficult.  Two 
approaches  to  simplifying  proofs  may  be  of  benefit: 

•  Development  of  theory  of  composition  of  proofs  that  allows  proofs  to  be  developed  for  compo¬ 
nents,  and  the  components  combined  without  having  to  reprove  results  for  the  overall  system. 
The  compositionality  of  FDR  is  an  example  of  this  approach.  A  general  approach  to  compo¬ 
sition  is  given  in  [1].  SRI  is  currently  investigating  composition  for  PVS. 

•  Development  of  powerful  lemmas  and  proof  strategies  that  hide  much  of  the  detail  of  formal 
proofs.  Well-known  decision  procedures,  such  as  those  for  Presburger  arithmetic  and  numerical 
inequalities,  are  provided  in  PVS.  This  eliminates  much  of  the  drudgery  associated  with  low 
-evel  details  of  formal  proofs.  Additional  proof  strategies  for  the  RSVM  need  to  be  investigated 
(our  experimentation  at  NRL  did  not  address  this  aspect). 

Expressibility  and  Validation  of  Properties 

HOL  is  sufficiently  expressive  to  provide  high  level  expression  of  the  properties  (Safety  and  Utility)  in 
a  form  that  may  easily  be  validated  with  the  formal  statement  in  the  GRC.  The  validation  approach 
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used  for  FDR  could  also  be  applied:  two  or  more  statements  of  a  property  could  be  specified  and 
then  proved  to  be  equivalent,  although  this  was  not  part  of  our  exp«iments. 


6  Summary 

Our  initial  experiments  with  CSP/FDR  and  PVS  have  given  us  considerable  insight  into  the  utility 
of  the  process  algebra  and  general  purpose  theorem  prover  approaches  for  specifying  and  vmfjdng 
real-time  properties.  Analysing  the  different  solutions  to  the  GRC  problon  should  help  identify  the 
strengths  of  each  approach  and  how  each  approach  can  be  used  productively  to  develop  industrial- 
strength  real-time  systems. 


Figure  1 .  Two  examples  of  occupancy  intervals 


Safety  Property:  t  e  Ui  Xi  =o  g(t)=0 
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Figure  2a.  Safety  Property  specifies  relation  of  gate  position  to  occupancy  intenrals 


Utility  Property: 
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Figure  2b.  Utility  property  specifies  relation  of  gate  position  to  occupancy  intervals 
and  constants  4i  and  ^ 
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AIRES:  An  Advanced  Integrated  Requirements  Engineering  System 


AIRES  is  an  advanced  integrated  requirements  engineering  system  which  has  been  developed  to 
assist  users  and  designers  in  the  preparation  of  correct  requirements  that  accurately  reflect  user 
needs.  As  such,  AIRES  considm  natural  language  prose  statements  as  the  most  natural  media 
format  for  user  expression  of  system  and  software  requirements  information.  AIRES  supprxts 
the  management  of  the  requirements  engineering  life  cycle,  as  shown  in  Hgure  1,  through 
modules  and  CASE  tools  directly  aimed  at  support  and  automation  of  the  speciflc  processes 
related  to  elicitation,  organization,  assessment,  pctMotyping,  and  transformation. 


^EUdUtion  ^ 


7 - ^ ^ 

Organization  ) 


Assessment 


1 


^Prototyping"^ 


Supports  interactive  and 
iterative  processes  and 
general  uses  such  as: 
Maintenance 
Traceability 
Reuse 


Transformatioa  to 
jbrmal  ^leciiications  J 


Figure  1  Requirements  Engineering  Lifecycle 

AIRES  provides  assistance  in  the  elicitation  and  capture  of  requirements  information  from  users 
through  an  innovative  conceptual  approach  of  muUi>gtoup  decision  support  systems  whose 
purpose  is  to  elicit  and  capture  information  in  a  consensus  building  environment.  The  AIRES 
assessment  modules  take  this  information  and  existing  specifications  from  legacy  systems, 
conditions  these  statements  and  automates  error  discovery.  Error  correction  is  accomplished  by 
the  user/designer  working  in  consort  The  AIRES  prototyping  module  may  work  on 
requirements  information  directly  from  the  elicitation  process  and  legacy  systems  or  from  the 
outputs  of  the  assessment  modules.  Once  requironents  sets  are  identified  as  prototype 
candidates,  AIRES,  through  its  systems  architecture,  provides  access  to  standard  tools  for 
construction  prototypes.  The  final  module  of  AIRES  automates  transformation  from  natural 
language  prose  statements  to  the  formal  language  of  finite  state  machines  through  use  of 
StateMate,  a  commercial  CASE  tool.  From  this  point  onward  traditional  software  development 
processes  take  over  with  the  user  and  designer  assured  of  correct  requirements  specifications, 
that  are  as  error  free  as  possible,  as  the  transformation  to  design  commences. 


ReqnlrMDeatt  Eagiacering  Objectives  aad  Needs 

The  ideal  ot(jectives  in  the  management  of  the  requiiements  eagineeriflg  process  ate:  to  develop 
the  characteristics  of  a  new  or  modified  system  that  are  complete,  consistent,  correct,  fieasble, 
maintaindde,  precise,  traceable,  testable,  unambiguoitt,  understandable,  validatable,  and 
verifiable.  While  these  characteristics  may  rqaeaent  an  ideal  set  oi  features.  It  is  not  possible  to 
assure  that  one  or  any  group  of  these  will  be  met  in  a  given  set  of  ie<]ulremeats.  To  demcmstrate 
that  the  foregoing  is  true,  we  need  only  examine  one  or  two  of  these  features.  Clearly,  it  is  not 
possible  to  determine  that  any  set  of  requirements  is  comfdete,  for  not  even  the  user  knows  when 
a  system  is  cmnidete.  Understandability  is  another  elurive  feature  in  that  we  must  ask  to  whom 
and  for  what  purpose  is  the  set  of  requirements  to  be  understandable. 

Clearly,  the  question  to  ask  is:  What  is  needed  to  maiuge  the  requirements  engineering  process? 
Hrst,  we  must  involve  the  user/designer  from  the  rmset  of  the  process;  miiumize  transformations 
that  alter  die  culgiiud  meaning  of  the  statement;  assess  requirements  sutements  prior  to 
transformations  so  as  to  provide  analysis  of  any  errors  introduced  by  the  user  rather  than  those 
introduced  by  the  transformation  interpretation  fxocess;  provide  for  foil  traceabilility,  both 
forward  fiom  the  highest  level  and  backward  from  the  most  detailed  level;  devek^  prototypes 
fix’  purposes  of  risk  marugement;  and  finally,  assign  metrics  to  requirements  statements  to 
provide  for  validation  and  verification  during  other  fdiases  of  the  devek^ent  life  cycle. 

Two  nujrx  goals  frx  AIRES  are  to  provide  the  user  and  designer  automated  support  to  enable 
them  to  find  things  that  should  be  in  a  set  of  requirements  and,  conversely,  to  find  things  that 
should  not  be  in  a  set  of  requirements  before  any  irreparable  damage  is  d(»e  to  the  requirements 
throqgh  transformations  to  formal  e>ecificatioos. 


AIRES  Archilectnre  nod  Operation 

AIRES  rqxesents  a  rignificant  advance  in  the  development  of  an  enabling  requirements 
engineering  process  oou|ded  with  automated  approaches  to  obtain  the  ri^t  requiremoits  from 
the  user  that  may  be  made  free  from  errors  of  certain  types  (ambiguity,  conflict,  redundancy, 
inccmsistency,  etc.)  without  compromise  of  user  intent  As  illustrated  in  Figure  2,  AIRES  is  an 
int^tated  environment,  supporting  and  spanning  the  requirements  en^neering  life  cycle. 
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SOFTWARE  ENGINEERING.  REQUIREMENTS  ENGINEERING.  AND  AIRK 

ADVANCED  INTEGRATED  REQUIREMENTS 
ENGINEERINO  SYSTEM  (AIRES) 


Figure  2  AIRES 

With  the  AIRES  approach,  each  activity  in  the  requirements  engineering  life  cycle  is  supported 
hy  a  set  of  automated  CASE  tools,  which  taken  together,  comprise  the  suite  of  CASE  tools  in  the 
AIRES  environment  Elicitation  is  suppcHted  by  a  computer  supported  cooperative  work 
envmmment  that  features  a  multi-group  decision  support  system  to  enable  groups  of  users  and 
desigiwrs  to  w(»k  together  even  if  spatially  or  tanpc»ally  distributed.  In  this  ^;}(«oach  to 
elicitation,  an  evolutionary  decision  meeting  model  is  formed  in  the  context  of  overall  sofhime 
systems  oigineering  that  adi^  the  management  technology  of  systems  engineering  and  applies 
it  to  software  to  help  ensure  that  ccvrect  software  is  designed,  and  not  just  that  software  designs 
are  correct  The  organize  function  is  aimed  at  fscilitating  user/designer  groups  in  organizing 
information  such  that  related  concepts  are  grouped  together  and  high  pricxity  items  are  given 
prmninoace.  In  the  instance  of  grouping  related  ooDcq>ts,  we  provide  for  conceptual  clustering 
al^mthms  for  a  variety  of  types  of  clustering.  The  role  of  priority  setting  falls  upon  the 
user/designer  reaching  consensus  as  to  which  ones  Me  most  important  for  the  system  under 
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consideration.  In  the  next  step  of  the  requirements  engineering  process,  we  engage  in 
assessment  activities  using  classification  and  clustering  of  requirements  sets.  (Hassification  is  a 
fxocess  which  adds  knowledge  about  requirements,  while  clustering  gives  us  the  means  to  form 
meaningful  groups  of  requirements.  Prototyping  enables  the  identification  of  requirements  at 
risk,  namely  those  that  continue  to  undergo  change  (volatile  sets);  those  that  contain  ambiguities 
those  that  are  inconsistent  or  in  conflict;  and  those  that  require  s)ecial  handling  to  provide 
understanding  sudi  as  human  computer  interfaces  or  database  management  systems.  Hnally,  we 
look  to  the  transformation  of  these  sets  of  informal  requirements  statements  to  the  formal 
language  of  a  specification.  For  AIRES,  we  have  also  selected  a  transfonnation  to  finite  state 
machine  language  such  as  that  found  in  the  commercial  CASE  tool  "StateMate". 

Syntax  of  Natural  Language  Requirements 

The  key  tedmological  aspects  of  AIRES  are  realized  through  the  utilization  of  the  syntax  of 
natural  language  {xose  statements  coupled  with  language  semantics  derived  from  domain 
speciHc  knowledge  bases;  rule  sets;  a  predeHned  requirements  taxonomy;  and  weighted  sets  of 
thesaurus  of  veibs  and  nouns,  synonyms  and  antonyms,  and  key  phrases  for  identification  of 
specific  characteristics  important  for  assessment,  identification  of  high  risk  clusters  and  systems 
features.  This  technique  supports  evaluation  of  requirements  clusters  for  prototyping; 
identification  of  errors  such  as  conflict,  inconsistency,  and  ambiguity;  determination  of  coupling 
and  cohesion  within  and  across  requirements  and  clusters  of  requirements;  determination  of  the 
ripple  effect  impact  of  adding  new  requirements  to  existing  systems  (maintenance);  storage  and 
retrieval  functions  (reuse)  of  specific  requirements  (or  clusters  of  requirements);  generation  of 
traceability  matrices  for  complete  auditing;  and  assesanent  of  the  degree  of  volatility  in 
requirements  generation.  All  of  this  is  directly  aimed  at  the  management  of  the  requirements 
engineering  process  and  the  management  of  risk  in  requirements  engineering  activities. 

AIRES  and  Requirements  Engineering  Benefits 

AIRES  provides  several  requirements  engineering  benefits.  These  include  the  ability  to 
fiKilitate  the  elicitation  of  requirements  information  ditough  multi>group  decision  support 
system  aids;  to  examine  legacy  system  specifications  for  problems  and  errors;  the  ability  to 
correctly  and  exhaustively  identify  requirements  specifications  subject  to  impact  through  the 
maintenance  (xocess  of  adding  new  functionality  to  these  legacy  systems;  and  the  ability  to 
sui^rt  requirements  risk  assessment  and  risk  management.  Other  facets  in  AIRES  include  the 
capability  to  archive  and  retrieve  requirements  specifications  for  reuse  without  the  necessity  of 
significant  investment  to  condition  these  in  advance  for  reuse;  the  ability  to  perform  forward  and 
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backward  traceability;  the  coupling  of  requirements  to  sizing  and  estimating  tools;  and  the 
capability  to  assign  metrics  to  requirements  specifications,  in  accord  with  the  classification 
taxonomy  utilized,  to  facilitate  the  design  of  validation  tests  upon  delivery. 

There  are  several  outcomes  associated  with  AIRES  operation.  The  af^lication  of  taxonomies 
and  rule  sets  supports  the  determination  of  conflict  within  and  across  statements  and  clusters  of 
statements;  the  identification  of  consistency  within  and  across  statements  and  clusters  of 
statements;  the  definition  of  coupling  and  cohesion  factors  within  and  across  pairs  as  well  as 
clusters  of  requirements;  and  the  determination  of  completeness  across  requirements  and  their 
clusters  using  domain  knowledge.  Additional  outcomes  in  the  application  of  taxonomies  and 
rule  sets  include  the  identification  of  potential  ambiguity,  redundancy,  and  internal 
completeness;  the  definition  of  the  ripple  effect  impact  of  adding  new  requirements  to  existing 
systems;  and  establishing  the  storage  and  retrieval  functions  for  q)ecific  requirements.  The 
application  of  classification  and  clustering  techniques,  in  the  context  of  predefined  taxonomies 
and  assodated  rule  sets,  also  supports  the  definition  of  the  degree  of  volatility  in  the 
requirements  generation;  the  preparation  of  traceability  matrices  for  complete  auditing  of  all 
requirements  throughout  the  life  of  the  project  (including  maintenance);  and  requirements 
prototyping. 


Bibliograi^y 


IAIKES9]  Aiken,  Peter  R,  A  Hypermedia  Workstatum  for  Requirements  Eiigmeermg,  pubitsbed  Ph.  D. 
DiueiUtion,  George  Maaon  Univeraity,  School  of  Infonnation  Technology  and  Engineering,  Fairfax,  VA  1989. 

IARM093]  Aimonr,  Frank,  A  Risk  Management  Abroach  for  Prototyping  Systans  Requirements  publisbed  Ph. 

D.  Dissertation,  George  Maaon  University,  School  of  Information  Technology  and  Engineering,  Fairfax,  VA., 
1993. 

IBEAM88]  Beam,  Waller  Palmer,  James  D.,  and  Sage,  Andrew  P.  Systems  Engineering  for  Software 

A'odbcftvio^  IEEE  Transactions  on  Syatena,  Man,  and  Cybernetics,  Vol  SMC>17,  No.  2,  Maich/April  1988. 

IBROU92]  Broaae,  Peggy  A  Process  for  useof  Multi-media  It^ormation  in  Reqiurements  Identification  and 
TraceaNIity,  publisbed  Ph.  D.  Dissettation,  George  Mason  University,  School  of  Information  Technology  and 
Enpneering,  Fairfax,  VA  1992. 

IEMME93]  Emmert,  Barbara  Multi-Croup  Decision  Suppeet  Systems:  Integration  and  Analysis  of  Reqturemetas 
Informatitm,  puUisbed  Ph.  D.  Dissertation,  George  Mason  University,  School  of  Information  Technology  and 
E^neeiing,  Fairfax,  VA  1993. 

IFIEL91]  Eelds,  N.  Ann  An  Evolutionary  Groiqt  Decinon  Model  Computer  Supported  Cotqterative  Work 
published  ni.  D.  DisserUtion,  George  Mason  Univefsity,  School  of  Information  Technology  and  Engineering. 
Fairfax,'VA  1991. 

PJAN91]  Liang,  l^qing,  1991,  Software  Reqiurements  Classifieation,  published  Ph.  D.  Dissertation,  George 
Maaon  University,  School  of  Information  Techmlogy  and  En^neering,  Fairfax,  VA  1991. 

IMYER88]  Myer,  Margaret  A  Knowledge-Based  System  for  managing  Software  Requirements  Volatility 
publisbed  Ph.  D.  Dissertation,  George  Mason  University,  School  of  Infomution  Technology  and  Engineering, 
Fairfax,  VA  1988. 

IPALM90]  Palmer,  J.  D^  Liang,  Yiqing,  and  Wang  Lillian,  Classification  as  an  Approach  to  Requirements 
Analysis,  Proceedings  of  the  JstASIS  SIGtCR  ClassificatioH  Research  Workshop,  Sussane  M.  Humphrey  and 
Baibiura  H.  Kwasnik,  Toronto,  Ontario,  Canada,  November  4, 1990. 

[PALM92]  Palmer.  J.  D.,  and  Liang,  Yiqing,  Indesting  and  clustering  of  software  reqiurements  ipecifications. 
Information  and  Decision  Technologies  April  1992. 

(PALM92a]  Palmer,  J.  D.,  and  Fields,  N.  Ann  An  haegrated  Environment  for  St^iware  Requirements  Engineering 
IEEE  Software  May  1992. 

{PALM92b]  Palmer,  J.  D.,  Fields,  N.  Ann,  and  Emnreit,  Barbara  Computer  Supported  Cooperative  Work 
Environment  for  Multqtie  Spatially  Distributed  Groups:  A  Case  Study  1992  lES  International  Conference  on 
Systenrs,  Man,  and  Cybernetics,  October  1992. 

IPFLE89]  Pfleeger,  Shari  L.  An  Investigation  of  Cost  and  Productivity  for  Object-Oriented  Development 
published  Ph.  D.  Dissertation,  George  Mason  University,  School  of  Iriforiiution  Technology  and  E^neering, 
Fairfax,  VA  1989. 

ISAMS89]  Samson,  D.  E.,  1989,  Automated  Astistance  for  Software  Requirements  Definition,  puUished  Ph.  D. 
Dissertation,  George  Mason  University,  School  of  Information  Technology  and  Engineering,  Fairfax,  VA,  1989. 


315 


REQUIREMENTS  MANAGEMENT/REQUIREMENTS  ENGINEERING  (RM/RE) 


LUKE  CAMPBELL 

NAVAL  AIR  WARFARE  CENTER  AIRCRAFT  DIVISION 
FLIGHT  TEST  AND  ENGINEERING  GROUP,  SY30 
PATUXENT  RIVER,  MD  20670 
(301)  826-7601 
FAX  (301)  826-7607 


Rcquiramcnts  Management  /  Requirements  Engineering  (RM/RE) 


To  control  costs,  better  productivity,  and  decrease  the  resources  required  for  deveiopment  and  support 
of  products,  NAVAIR-S46  has  developed  a  process  which  accounts  for  the  structure  of  all 
developments:  Requirements  Management  and  Requirements  Engineering  (RM/RE). 

This  RM/RE  process  is  used  throughout  the  ife  cycle  of  the  product,  and  is  itself  under  constant 
improvement  F^^E  estabRshes,  controls,  tracks,  and  engineers  requirements  for  aviordcs  systems. 

A  cornerstone  to  RM/RE  is  the  use  of  new  automated  software  tools  wMch  greatly  enhance  the  human 
ability  to  manage,  track,  update  and  control  requirements.  These  tools,  together  with  new  techniques 
and  methodologies,  are  combined  in  a  process  to  ensure  requirements  are  properly  eKdted,  defined, 
tracked,  modeled,  and  tested  throughout  the  life  cycle. 

RM/RE  is  the  baseGne  process  to  which  all  other  processes  must  attach;  managing  requirements  allows 
program  personnel  to  manage  all  other  aspects  of  the  system  engineering  program.  The  primary  steps 
which  we  have  defined  for  the  RM/RE  process  are: 

o  Requirements  EGcitation  -  the  process  of  documenting  the  needs  and  resolving  the  disparities 
among  the  involved  stakeholders  for  the  purpose  of  defining  and  distiIGng  requirements  to  meet 
the  constraints  of  these  stakeholders. 

0  Requirements  Insoection/VaGdation  -  the  process  which  involves  checking  the  accuracy  of  the 
products  of  the  preceding  operations,  in  order  to  validate  that  the  requirements  derived  are  an 
accurate  reflection  of  the  user  needs. 

o  Requirements  Analysis  •  the  process  of  studying  and  refining  user  needs  in  terms  of  system, 
hardware,  or  software  requirements. 

The  RM/RE  Process  is  a  combination  of  technical  engineering  disdpGnes  and  an  administrative 
process  for  estabGshing,  validating  and  maintaining  the  many  requirements  of  a  system  throughout  its 
Gfe  cycle. 

A  requirement  is  a  statement  of  need,  a  characteristic  of  a  need,  or  a  condition  or  capabiGty 
that  must  be  met  or  possessed  by  a  system  or  system  component  to  satisfy  a  contract, 
standard,  specification,  or  other  formally  imposed  document  or  documents.  The  process  of 
defining  requirements,  within  the  context  of  RM/RE,  encompasses  the  three  interrelated 
operations  mentioned  eariier.  eGcitation,  inspection/vaGdation,  and  analysis. 

AppGcation  of  these  three  interrelated  operations  is  referred  to  as  RM/RE.  The  foundation  of 
success  for  any  project  is  dependent  on  the  quaGty  of  its  requirements  and  the  management  of 
these  requirements.  Identification  and  correction  of  incomplete  or  inconsistent  requirements  early 
in  the  system  Gfe  cycle  will  help  to  alleviate  costly  and  time^onsuming  modifications  later,  or 
prevent  development  of  a  system  which  does  not  perform  as  required.  Studies  have  shown  that 
requirements  management  errors  contribute  to  the  majority  of  system  defects  and  are  the  most 
co^  to  remove. 

A  poorly-written  attempt  to  define  a  requirement  vmII  not  accurately  describe  its  author's  need,  will 
not  accurately  translate  this  need,  and  will  not  be  correctly  interpreted  by  the  developer  of  the 
system.  The  result  of  a  poorly  written  requirement  which  is  not  rectified  will  be  a  system 
development  defect 

A  system  development  defect  exists  when  the  written  specification  does  not  describe  the  developed 
product  System  development  defects  can  be  divided  into  four  broad  categories:  requirements  defects, 
design  defects,  code/build  defects,  and  documentation  defects.  Each  defect  type  produces  downstream 
problems,  i.e.  a  requirements  defect  results  in  design  defect(s)  which  then  cause  code/buiid  defect(s) 
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and  documentation  defect(s).  When  defects  are  identified  early  in  the  development  cycle,  fewer 
problems  wiU  be  encountered  as  system  development  progresses. 

Well-written  requirements  will  be  worded  to  make  more  than  one  interpretation  unikely,  since 
requirements  are  read  and  interpreted  within  the  same  organization  by  persons  of  varied  backgrounds. 

A  well-written  requirement  will  concisely  describe  a  need.  Accurate  translation  of  this  need  by  the 
system  developer  is  essential.  Requirements  will  be  written  with  the  imperative  ''shall.''  When 
statements  written  with  "shalP  are  contained  in  contractual  documents,  they  are  legally  binding. 

Although  human  intenrention  is  required  during  ail  phases  of  the  development  process,  RM/RE  should 
be  accomplished  using  automated  tools.  It  is  not  practical  to  manually  decompose,  analyze,  and  trace 
each  requirement.  A  further  advantage  to  automation  is  the  resulting  ease  of  change  impact 
assessments.  Wherever  changes  are  made,  an  automated  requirements  database  will  provide  the 
process  for  identifying  all  finked  requirements  which  also  are  affected. 

1.0  Requirements  Elicitation 

Requirements  efidtation  involves  "capturing"  thoughts  and  ideas  from  stakeholders  and  depositing  these 
ideas  in  a  central  repository.  Key  to  requirements  elicitation  include  capturing  rationale  for  each 
requirement,  and  recondfiation  of  requirements  (using  autonuited  software  tools)  to  assure  that  the 
requirements  are  ordered  and  connected. 

To  date,  the  process  step  of  elicitation  has  been  haphazard  and  lacking  in  engineering  rigor.  Generally, 
a  spedfication  is  developed  based  on  a  previous  program,  and  changes  are  added  by  various  domain 
experts.  There  are  attempts  to  control  the  integration  of  requirements,  but  these  generally  fall  short  of 
the  need.  The  resulting  product  is  often  inconsistent,  behind  schedule,  and  ripe  for  problem  reports  or 
change  requests. 

A  better  method  for  efidtation  involves  using  automated  software  tools  and  techniques,  such  as 
storyboarding  and  prototyping,  in  conjunction  with  a  Requirements  Efidtation  Conference  attended  by 
an  stakeholders.  This  efidtation  method  allows  the  capture  of  requirements  and  the  assodated 
rationale  in  a  forum  where  all  stakeholders  are  represented  and  a  consensus  is  formed. 

The  efidtation  method  requires  a  more  intense  peak  of  activity  for  the  stakeholders,  which  should  be 
accomplished  quickly  (in  several  days)  with  the  use  of  techniques  such  as  storyboarding  (which 
enhance  the  gathering  of  requirements),  scribes  (which  assemble  the  requirements)  and  automated 
tools  (which  quickly  sort  and  categorize  the  requirements).  Within  the  requirements  elicitation  process 
are  five  subprocesses: 

o  Capture  Requirements  from  Stakeholders 
o  Requirements  Recondfiation  (Creation) 
o  Initiate  Requirements  Traceability 
o  Stakeholders'  Input  Complete  Determination 
o  Establish  Database 

1.1  Capture  Requirements  from  Stakeholders 

Upon  initiation  of  a  program,  inputs  are  received  from  the  stakeholders;  these  inputs  Include  the  system 
requirements  and  all  appficable  documents.  The  most  advantageous  manner  in  which  to  acquire  the 
requirements  is  in  a  Requirements  Efidtation  Conference,  where  all  stakeholders  are  gathered  in  one 
place  to  provide  and  reach  a  consensus  on  inputs,  finkages,  and  rationale.  New  system  requirements 
originate  from  defidendes  In  existing  systems'  capabilities  due  to  new  or  enhanced  technologies,  or 
due  to  newly  identified  threats.  These  defidendes  are  examined  and  a  Mission  Needs  Statement 
(MNS)  is  issued  which  defines  the  need  for  the  system.  The  MNS  is  translated  by  OPNAV  in  to  the 
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Operational  Requirements  Document  (ORD).  From  the  requirements  of  the  ORD,  a  requirements 
database  is  established.  After  estabGshing  the  requirements  database,  and  as  system  development 
progresses,  there  will  be  additional  input  to  this  process  in  the  form  of  approved  changes.  These 
requirements  changes  will  be  captured  in  the  database,  keeping  a  current  basefine  for  the  requirements 
at  an  times.  Automated  requirements  management  tools  are  available  for  establishing,  maintaining, 
and  updating  the  database. 

Requirements  Reconciliation  (Creation) 

During  the  Requirements  RecondOation  (Creation)  process,  all  Bnkages  between  requirements  should 
be  defined  and  documented.  This  can  be  done  using  a  requirements  management  tool.  Requirements 
should  flow  down  from  the  high-level  requirements  in  the  ORD  to  the  next  succeeding  lower-level 
design  spedfication.  High-level  requirements  (parents)  should  be  decomposed  into  tower-level 
requirements  (children)  in  the  same  document  or  in  successive  multiple  lower-level  documents. 

Proper  RM/RE  en  res  complete  decomposed  ftowdown,  so  that  there  are  no  barren  requirements  (high 
level  requirements  without  children).  Similarly,  orphans  (low-level  requirements  without  parents)  should 
not  exist  In  addition  to  requirements  in  the  parent-child  domain,  peer  relationships,  in  which  the  Rnked 
requirements  populate  the  same  level,  likely  will  exist  Requirements  which  lack  essential  relationships 
should  be  recondled.  The  use  of  techniques  such  as  storyboarding,  lexical  analysis,  and  rapid 
prototyping  should  be  employed  to  recondle  requirements  which  are  barren  or  orphaned. 

1.2.1  Storyboarding  is  a  modeGng  technique  that  extends  from  requirements  analysis  and  simulation 
methodology.  A  storyboard  is  a  sequence  of  displays  that  represents  the  functions  that  the  system 
may  perform  when  formally  implemented.  The  intent  of  storyboarding  is  to  provide  a  means  of  better 
defining  and  validating  system  requirements.  Storyboarding  provides  a  means  to: 

o  more  predsely  establish  what  to  build  and  how  to  specify  it, 
o  explore  aKemative  designs  and  man/machine  interfaces, 
o  promote  synergism  among  stakeholders. 

The  use  of  storyboarding  as  a  technique  to  verify  requirements  definitions  and  to  establish  and  maintain 
spedfications  provides  several  benefits.  Characteristics  of  the  system  can  be  examined,  user  feedback 
can  be  obtained,  alternatives  can  be  examined  and  evaluated,  and  problems  can  be  detected  early  and 
corrected. 

1.2J2  Lexical  Analysis  is  a  methodology  for  examining  the  complexity,  elements,  and  the  relationship 
of  words  or  vocabulary  from  its  grammar  and  construction.  The  process  includes  analyzing  the  syntax 
of  natural-language  statements  and  then  classifying  the  requirements  according  to  similarities.  Once 
the  requirements  are  classified,  they  must  be  exardned  for  impredse  and  ambiguous  words,  conflicts 
among  quality-metrics  must  be  detected,  and  the  identified  problems  must  be  presented  to  the 
stakeholders  for  clarification  and  resolution. 

1.2.3  Rapid  prototyping  is  a  methodology  which  allows  quick  development  of  a  preliminary  design  or 
model  of  a  system  to  define,  analyze  and/or  vaGdate  the  characteristics  of  a  proposed  system.  It  is  a 
method  to  verify  specifications  for  completeness  and  effectiveness. 

1.3  Initiate  Requirements  Traceability 

A  database  of  proposed,  or  candidate,  requirements  should  be  established,  and  must  be  updated  when 
necessary.  The  inputs  to  this  database  contain:  system  requirements,  design  requirements, 
hardware/soflware  requirements,  and  test  requirements.  Other  associated  data  elements  should  be 
entered  for  each  requirement,  such  as  method  of  testing,  rationale,  date  of  origination,  linkages,  etc. 
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The  populating  of  the  database  is  an  iterative  process,  which  begins  as  soon  as  the  RM/RE  process  is 
initiated.  As  requirements  are  identified,  each  is  entered  in  the  database  as  a  requirement  candidate, 
pending  recondiation,  traceabitty  analysis,  and  stakeholders'  approval.  The  database  should  provide 
traceability  of  all  requirements  and  any  exceptions. 

In  general,  a  requirements  database  can  be  created  and  maintained  using  a  number  of  commerdaly 
available  database  management  software  packages.  The  benefits  to  using  a  database  manager  are  the 
ability  to  rapidly  store,  edit,  organize,  analyze,  and  retrieve  large  amounts  of  data. 

Several  vendors  have  customized  database  managers  to  support  requirements  management  and 
analysis.  These  products  can  save  considerable  time  and  effort  needed  to  customize  generic  database 
managers  for  RM/RE  support. 

All  requirements  must  result  from  a  decomposition  or  synthesis  of  the  higher  level  requirements  defined 
in  the  ORD.  The  requirements  are  analyzed  for  traceability  to  determine  if  any  of  the  higher  level 
requirements  were  not  decomposed,  are  not  linked  to  a  higher  level  requirement,  or  do  not  adequately 
describe  the  higher  level  requirement.  The  resulting  Gnkages  form  a  requirements  traceability  'tree." 
The  traceability  tree  shows  the  hierarchical  decomposition  of  the  ORD  requirements.  The  traceabilty 
tree  should  be  analyzed  to  ensure  that  there  is  complete  traceability  between  the  ORD  and  the  lower 
level  specifications,  as  applicable  (i-S-  there  are  no  missing  or  excess  requirements). 

If  an  ORD  requirement  does  not  decompose  when  translated  into  lower-level  specifications,  then  either 
the  high-level  requirement  does  not  actually  exist,  and  it  should  be  retnoved  from  the  ORD.  or  the 
lower-level  specification  needs  to  be  modified  to  include  the  decomposition.  At  a  Requirements 
Inspection  Conference  (RIC),  all  irregularities  must  be  examined  in  order  to  reach  a  consensus  on  their 
resolution.  Each  group  of  requirements  that  satisfies  a  single  ORD  requirement  is  analyzed  to  ensure 
that  the  respective  groups  do  not  conflict  with  each  other  and  that  they  completely  satisfy  the  higher 
level  requirement. 

1.4  Stakeholders  Input  Complete  Determination 

Prior  to  certifying  the  establishment  of  a  baseline  requirements  database,  a  decision  must  be  reached 
that  all  stakeholders'  inputs  have  been  eGdted  and  documented.  All  requirements  in  the  ORD  must  be 
identified,  and  traceabifity  mechanisms  must  be  in  use.  This  fist  of  requirements  becomes  the 
agreement  between  the  OPNAV  sponsor,  PEO/PMA,  and  NAVAIR.  It  is  imperative  that  requirements 
are  clearly  understood  and  traced  through  to  implementation.  If  input  is  not  complete,  the  process  of 
obtaining  requirements  from  stakeholders  must  continue.  When  the  process  is  complete,  the 
requirements  database  can  be  defined  as  "basefine." 

1.5  Database  Establishment 

After  requirements  have  been  reconciled,  the  stakeholders  ratify  that  each  candidate  requirement  is  a 
system  requirement.  The  system  requirements  and  associated  data  elements  establish  the 
requirements  baseline.  These  basefine  requirements  are  embodied  in  a  baseline  requirements 
database.  This  database  is  maintained  and  modified  throughout  the  fife  of  the  program.  Configuration 
management  procedures  must  then  be  applied  to  any  request  for  modification  of  any  requirement. 

1.5.1  Database  Access. 

There  are  two  subprocesses  for  database  access:  Queries  and  Reporting,  &  Database  Modification. 
Queries  and  Reporting  provides  the  mechanism  to  access  the  database  to  retrieve  information  on  the 
requirements  and  associated  characteristics  residing  in  the  baseline  requirements  database.  The 
information  can  be  requested  via  individual  ad  hoc  questions,  or  can  be  requested  via  ad  hoc  or  pre¬ 
programmed  report  formats.  Presentation  of  the  requested  information  is  in  the  form  of  on-screen 
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individual  responses,  or  on-screen  or  hard  copy  reporte.  Modifications  to  the  basefine  requirements 
database  can  only  be  effected  after  aO  CM  procedures  have  been  fulfified.  The  individual  responsible 
for  datable  administration  is  the  only  person  authorized  to  modify  the  datable. 

2.0  Requirements  Inspection/Validation 

Requirements  InspecUon/VaBdation  is  the  process  of  determining  and  inking  the  relationships  between 
technical  documentation.  Inputs  to  this  top  level  process  include  the  entire  realm  of  miltary  and  DoD 
standards,  plus  the  system  documentation  available  at  the  time  of  Inspection:  the  ORD,  the  System 
Specification,  and  the  prelminary  SOW. 

Another  aspect  of  Requirements  InspectionA/aidation  is  holding  conferences  to  review  traceabifity 
defidendes.  Barren  requirements,  orphan  requirements,  ambiguous,  untestable,  and  negatively  stated 
requirements  all  must  be  examined.  The  practice  of  holding  conferences  promotes  communication 
among  team  members.  Outputs  from  the  Requirements  InspectlonA/afidation  process  include  program- 
appficabie  standards,  with  the  appropriate  sections  inked  to  the  requirement(s)  that  reference  them,  as 
well  as  coordinated  (properly  linked  and  traceable)  system  requirements.  Within  the  Requirements 
InspectionA/aidation  process  are  three  subprocesses: 

0  Requirements  Vaidity  Determination 
o  Authenticated  Requirements  Determination 
o  Forward  Inputs  to  Baseline  Requirements  Database 

2.1  Requirements  Validity  Determination 

The  result  of  Requirements  InspectlonA/aidation  determines  the  vaidity  of  the  requirement  update.  If 
the  process  ascertains  that  a  proposed  requirement  update  is  invaid,  the  proposed  update  must  be  re¬ 
analyzed  in  the  Requirements  Analysis  Process. 

2.2  Authenticated  Requirement  Determination 

The  Stakeholders*  InspectiorWaidation  authenticates  the  requirement  update.  Authentication  is 
approved  by  a  single  point  of  authority  designated  by  the  PMA.  If  authentication  is  not  approved,  then 
the  proposed  update  must  be  re-analyzed  in  the  requirements  analysis  process. 

2.3  Forward  Updates  to  Baseline  Requirements  Database 

Approved  updates  are  forwarded  to  the  individual  responsible  for  database  adn^nistrafion  for  inciution  in 
the  database. 

3.0  Requirements  Analysis 

Requirements  Analysis  is  the  process  of  determining  appicable  standards,  performing  trade  studies, 
recondiafion  (or  refinement),  and  cost/schedule  impact  analysis  for  each  requirement.  This  process  is 
performed  iteratively  on  each  new  requirement,  important  to  this  process  is  paying  particular  attention 
to  the  interaction  between  requirements.  As  development  progresses  and  more  is  learned  about  the 
system,  resultant  changes  will  affect  this  interaction. 

A  dtange  to  a  requirement  may  have  a  significant  affect  on  other  system  requirements.  Therefore,  a 
change  impact  analysis  must  be  carried  out  for  every  dtange  request  or  proposed  remedial  action. 
Impact  assessment  is  an  iterative  process.  Once  the  affected  requirements  have  been  identified,  ai 
directly  assodated  requirements  must  be  identified.  These  assodated  requirements  must  be  assessed 
for  impact  This  process  continues  until  no  assodated  requirements  are  affected.  The  subprocesses 
for  the  Requirements  Analysis  Process  are: 
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o  Determine  Appicable  Standards 

o  Trade  Studies 

o  Requirements  Recondlation 

3.1  Determine  Applicable  Standards 

From  system  reqtdrements,  DoD  STDs.  MIL  STDs,  and  DoD  instruction  5000^,  the  appropriate 
requirements  can  be  determined  for  the  foflowing  (Opines;  nnaintainabiity.  reiabilty,  survivabWty, 
safety,  EMC/EMI.  human  fectors,  ILS,  QA.  produdbilty. 

Inputs  such  as  spedtication  numbers  and  paragraphs  for  each  requirement  from  other  functional 
(fiscipines  within  an  organization  must  be  soicited  and  coorcfinated.  The  output  wil  be  Inkages 
between  requirements  and  paragraphs  in  the  standards. 

3Jt  Trade  Studies 

Trade  Studies  are  the  inquiry  and  investigation  of  the  results  of  choosing  orte  method  of  accompisMng 
a  requirement  over  another.  This  usually  takes  place  in  terms  of  off-tt)e-ehelf  sohjtions  versus  buikRng 
requirement-specific  solutions  fmake  or  buy"  decisions).  Trade  Studies  should  be  accompished 
before  proceeding  to  the  next  RM/RE  process  step  of  Requirements  Recondlation  (Refinement). 

3.3  Requirements  Reconciliation  (Refinement) 

Requirements  Recondlation  (Refinement)  is  the  process  of  integrating  requirements  within  models  and 
prototypes  to  evolve  a  picture  of  the  completed  system. 

3.3.1  SpecHIcation  Modeling 

Specification  Modeing  is  used  to  evaluate  the  correctness  and  performance  of  a  system  spedtication. 
Specification  Modeing  provides  a  more  easily  understood  translation  of  the  specifications  requirements. 
Modeing  ensures  the  completeness  and  correctness  of  the  specification  and  valdates  that  the 
requirements  meet  the  needs  of  the  end-users. 

Spedficafion  Modeing  is  accompished  by  translating  the  specification's  written  words,  using  a 
structured  methodology,  into  a  graphical  and  operable/executable  language.  Using  this  language,  the 
spedtication  can  be  visualzed,  analyzed,  and  operated  by  end-users.  If  execution  of  the  model 
iridicates  that  the  system  wiU  fail  to  meet  the  users'  requirements,  the  model  is  modified  until  a 
specification  evolves  that  meets  the  requirements. 

3.3.2  Architectural  Modeling 

Architectural  Modeing  is  used  to  evaluate  the  correctness  and  performance  of  the  architectural  design 
of  a  system.  Both  hardware  and  software  performance  are  evakiated  by  simulating  the  model  derived 
from  ^  system  specification.  Correctness  is  evakiated  by  executing,  during  the  simulation,  assertions 
(consistency  constrdnts)  that  the  evakiatof/anaiyst/designer  attaches  to  each  de^n  spedlication 
component 

Automated  tools  can  be  used  to  improve  the  qualty  of  complex  systems  and  reduce  the  cost  and  time 
reqirired  to  design,  implement,  and  optimize  such  systems.  Tools  that  are  capable  of  simulating  a  high 
degree  of  concurrent  processing  are  particularly  wel-suited  to  model  computer  hardware  and  software 
systems  within  avionics  systems,  esf^ally  the  timing  considerations  that  are  important  to  reaMime 
procesMng  environments. 
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3.JJ  Prototyping 

Prototyping  Is  a  teduiiquo  usod  to  domonstrate  the  desired  functionaity  or  behavior  of  a  proposed 
system.  ARhough  prototyping  is  often  characterized  as  a  risk  management  technique,  it  is  actualy  a 
requirements  generation  and  vaRdation  technique  that  reduces  risk  through  improved  system 
requirements.  What  makes  prototyping  so  attractive  (and  effective)  is  that  cornputer-aided  tools  have 
made  the  technique  an  inexpensive  and  rapid  means  to  address  s^^em  requirements  early  in  the 
system  development  Ife  cycle.  Prototyping  has  the  potential  for  improving  a  system  engineering  effort, 
but  canncR  guarantee  its  success.  It  te  important  to  understand  that  prototyping  augments,  rather  than 
replaces,  the  traditfonai  system  engineering  process  of  specification,  design,  build,  and  test. 

Prototyping  has  different  meanings  depending  on  whether  it  deals  with  hardware  or  software.  For 
hardware,  prototyping  typically  refers  to  a  well-defined  activity  in  DEMVAL  or  E&MD,  where  a  physical 
model,  exhibiting  all  the  essential  requirements  as  a  gt^e  for  further  production,  is  produced  in 
advance.  In  contrast,  software  prototyping  is  not  assodated  with  a  spedfic  acquisition  phase  and  does 
not  rely  on  having  the  product  charaderistics  well-defined  in  advance.  Its  primary  goal  is  to  serve  as 
a  learning  vehicle  to  provide  more  predse  ideas  or  spedfications  about  what  the  target  system  should 
be.  in  many  situations,  a  software  prototype  does  not  become  part  of  the  final  produd. 

Although  there  are  several  categories  of  prototyping,  exploratory  prototyping,  where  the  emphasis  is  on 
clarifying  requirements  of  the  target  system,  is  comrrranly  used  during  RM/RE.  Exploratory  prototyping 
focuses  on  communication  problems  between  developers  and  prospective  end-users,  particularly  in  the 
early  stages  of  system  development.  The  developers  often  have  limited  information  about  the 
intended  application,  while  the  end-users  have  no  clear  idea  of  what  the  system  must  do  for  them.  In 
this  situation,  a  practical  demonstration  of  possible  system  functions  serves  as  a  catalyst  to  elidt  ideas 
and  promote  a  creative  cooperation  between  all  the  stakeholders.  Such  a  demonstration  does  not  have 
to  focus  on  one  particular  solution,  but  can  point  out  alternatives  whose  respective  merits  can  be 
discussed.  The  prototype  is  an  aid  for  establishing  the  features  a  target  system  should  incorporate. 
These  are  subsequently  codified  in  the  systems  spedfication. 
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ABSTRACT 

ConqMersecwity,si^ety  and  nsiUaice  are  usually  in^;>kmenMlmifyqfter  a  syam  has 
developed.  This  leaves  a  lot  pfpotetuial  risks  duu  must  be  acctnaued  for  at  htge  costs  at 
a  later  stage.  This  article  takes  computer  security,  safoty  and  r^&ence  right  to  the 
be^nning  qfthe  syaems  developniem  Ifo  cyde  •  tite  user  retpnremem  specification. 

limited  reference  was  fintnd  in  die  literature  on  how  io  determine  die  r^pdrements  for 
am^aer  security,  sefety  and  resiUerice.  This  artideprt^ioses  a  medurd  for  determining  emd 
spec^dng  cotrqruur  security,  sefety  and  resilience  requirements  and  to  indude  these  as  part 
tf  die  user  recrement  spec^icatiorL 

By  using  this  methodology  a  emnpUte  set  of  compuur  security,  safety  and  retilierux 
recrements  can  be  determined  and  specific  as  eariy  as  possible  during  the  developmem 
phase. 

This  methodology  is  based  on  the  definition  of  a  requirements  matrix  by  a  Constraints 
Engineer.  The  importance  of  die  different  conqnaer  security,  safety  and  resilience 
requirements  will  be  rated  in  relation  to  diefitnctiond  requirements,  and  applicable  counter 
measures  veil  be  allocated.  This  will  lead  to  jus^iable  costs  for  inplanenting  computer 
security,  sttfety  and  resittence  for  appUcobie  systems. 

The  complete  set  ofconputer  security,  sttfety  and  resilience  requirements  can  be  used  as  a 
reference  efier  implemenuition  of  the  system  to  determine  whetiter  all  the  ctmputer  security, 
sttfety  and  resilience  requirements  have  been  atxowaed  for. 


ABOUT  THE  AUTHORS 

DNJMOSTERT 

A  oomputo’  consultant.  Cunently  working  towards  a  Phd  dt^iee  at  the  Rand  Afrikaans 
University. 

SHVONSOLMS 

Head  of  the  dqpaitment  for  Computer  Sdence  at  the  Rand  Afrikaans  University. 


325 


1  INTRODUCTION 

Computers  are  increasiiigly  used  in  environments  where  fidlures  cannot  be  ttrierated 
and  where  errors  would  have  dangerously  unpredictable  results.  [2] 

Computer  security,  safety  and  resilience  are  usually  only  *implmnented”  after 
development  of  the  system.  For  systems  such  as  high  performance  transaction-, 
process  control-  and  other  safety  critical  systems,  it  became  important  for  computer 
security,  safety  and  resilience  to  be  "part*  of  die  system  right  from  die  user 
requirements  specification  of  die  syston.  Therefore  it  is  inqxirtant  to  determine  and 
^ledfy  the  computer  security-,  safety-  and  resilience  requirements  (hereafter  referred 
to  as  CSSR)  as  an  integrated  part  of  the  functimial  system  requirements  of  the 
computerized  system  during  the  user  requirement  spedficadcm  phase  of  systmns 
development  Limited  reference  was  fmind  in  the  literature  on  mediods  to 
determine  or  specify  computer  security-,  safety-  or  resilience  requirmnents  for 
a  computorized  system. 
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This  artide  will  propose  a  inettKXkdogy  used  on  two  levds  to  tteterniine  and  specify 
conqmter  security,  safety  and  rerilienoe  requiranents  as  part  of  tte  user  requirement 
iqjedfication. 


The  first  level  of  this  methodology  will  be  known  as  the  Constraints 
Acquisititm  methodolc^  (CAM)  and  will  be  used  during  determination  of 
systems  requirements 

The  second  level  of  this  methodology  will  used  during  determination  of  die 
detail  software  requirements  for  the  tystem  and  will  be  known  as  CAM/S. 

A  Constraints  Engineer  is  a  specialist  r^arding  computer  security,  safety  and 
rerilience  for  a  specific  domain  of  tystems.  The  Cmistraints  Acquisition  methodology 
(CAM)  and  Constraints  Acquisition  Methodology  for  Software  (CAM/S)  are  the  tools 
the  Ccmstraints  Engineer  will  use  to  make  sure  that  die  CSSR  requiranents  are  part 
of  die  user  tystem  requirement  spedficadcm  and  of  die  user  software  requirements. 
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Section  2  will  show  where  diat  CAM  and  CAM/S  extends  the  existing  ocnnputer 
^sterns  engineering  process  to  indude  die  CSSR  requirements. 

Section  3  will  briefly  explain  the  stqps  to  determine  the  funcdmial  system 
requirements. 

Section  4  will  discuss  the  Requirements-  and  Counter-measure 
matrixes,  which  form  the  basis  for  the  methodology  to  and 

specify  computer  security,  safety  and  resilience  requirements 
as  part  of  the  user  requiranent  specification. 

Section  5  of  this  article  will  mcplain  the  Constraints  acquisition 
methodology  for  CSSR  elidtation,  using  the  matrixes  introduced  in  Section  4. 

Section  6  will  explain  the  Constraints  acquisiticm  methoddogy  for 
Software.  This  section  will  show  die  relationship  between  CAM  and  CAM/S. 
CAM/S  will  be  used  for  CSSR  elidtadon  during  user  software  requirements 
determination  and  spedficadon. 


328 


2  COMPUTER  SYSTEMS  ENGINEERING  AND  CAM  /  CAM/S 


Hus  section  will  show  where  CAM  and  CAM/S  fits  into  the  existing  computer 
en^neering  processes. 

Hie  process  from  exploring  the  need  (as  qiedfied  by  the  user  or  as  dictated  by  the 
buaness),  up  to  implementing  a  computerized  system  can  be  defined  as  computer 
systems  engineering.  This  process  includes  tasks  like: 

•  specification  of  the  system  requimnents, 

•  the  software  development  life  cycle  and 

•  the  hardware  development  life  cycle. 

User  requirements  specification  will  be  done  during  each  of  these  tasks  in  more  or 
less  detail.  During  the  task  of  specifying  the  ^stem  requirements  it  is  important  to 
determine  and  qiedfy  the  global  system  functicms  (hardware  and  software  functions). 
Hierefor  it  is  important  to  determine  the  CSSR  requirements  bearing  in  mind  die 
^dbal  jncture  of  the  system.  During  the  Systmns  development  life  cycle  it  is 
important  to  mcpand  the  "systmn  requirements"  into  detail  software  requirements.  At 
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this  stage  it  is  important  to  also  determine  the  CSSR  requirements  for  each  of  those 
detail  software  requirements.  This  will  be  illustrated  in  Section  6  of  this  article. 
The  process  of  computer  systems  engineering  can  be  rqnesented  hy  the  following 
diagram,  which  also  shows  where  CAM  and  CAM/S  fit  into  the  whole  process.  [1] 


Figure  1  -  The  process  of  Computer  systems  engineering 
and  CAM,  CAM/S 
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2.1  SYSTEM  REQUIREMENTS  >  CAM 

The  first  stq)  in  fiie  de^opment  of  a  computerized  system  can  be  characterized  by 
file  jqiedficatimi  of  the  system  lequiiements.  This  stq>  precedes  both  hardware  mid 
software  en^eering. 

The  major  objectives  [1]  of  this  stq>  are  to: 

•  Evaluate  the  systems  oonoq;>ts  for  fearibility,  cost  benefit  and  business  needs 

•  describe  the  system  interfaces,  functions  and  performances 

•  perform  preliminary  functional  systems  requirements  and  design.  Part  of  this 
will  be  to  specify  the  systems  requirements.  The  specification  of  system 
requirements  can  be  accomplished  in  different  ways.  It  can  be  an  informal 
description  of  the  requirements  or,  at  the  other  extreme  be  a  more  formal 
method  of  a  mathematical  description  of  the  requirmnents.  It  can  also  be  a 
process  of  an  informal  description  that  "develops”  into  a  more  formal 
description. 

•  allocate  functions  to  hardware,  software  and  supplementary  systems  dements 

•  establifi)  cost  and  schedules  constraints. 

This  paper  will  concentrate  on  the  informal  desmiption  (third  objective)  of  systmns 
requirements.  This  informal  system  requirement  will  be  mctended  to  indude  CAM. 
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2.2  SOFTWARE  DEVELOPMENT  LIFE  CYCLE  -  CAM/S 


After  detennining  the  system  requirements,  certain  user  requirements  (functions)  are 
allocated  to  be  implemented  in  software.  These  function  will  be  expanded  in  an 
attempt  to  satisfy,  mainly  the  following  objectives  [1]: 

•  uncovering  the  flow  of  information 

•  describe  the  software  functions,  validation  requirements  and  design 
constraints. 

Keeping  in  mind  the  overall  CSSR  requirements  it  is  also  important  to  determine  and 
specify  the  CSSR  requirements  for  these  detail  software  requirements.  Using  CAM/S 
as  a  tool  the  CSSR  requirements  can  be  detmmined  and  q)ecified. 
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3  STEPS  m  FUNCTIONAL  SYSTEM  KEQUl 


!*frS  ACQUISITION 


It  is  useful  to  describe  the  process  of  uso-  functicmal  system  requirements  acquisitimi 
through  the  following  stq>s  [3]: 

•  Elicitation 

•  Formalization 

•  Validation 

The  methodology  proposed  in  this  pap^  will  extend  the  didtation  and  fonrnr.'ization 
stq>s  to  include  the  CSSR  requirements  with  requirements  acquisition. 
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BASIS  FOR  THE  CONSTRAINTS  ACQUISITION  METHODOLOGY  (CAM) 
&  CONSTRAINTS  ACQUISITION  METHODOLOGY  FOR  SOFTWARE 
(CAM/S) 

Determining  the  Amctional  ^stem  lequiranehts  caii  be  a  very  oonq)lex  and  tedious 
process.  Proven  methods  and  methodologies  exist  for  detenninadon  and  ^ecificalimi 
of  the  functional  requirements.  The  determination  of  CSSR  requirements  received 
very  little  attention  in  these  methodologies.  The  lack  of  a  m^od  (how)  to  determine 
and  specify  CSSR  requirements  during  this  phase  inspired  this  study. 

DOMAIN  OF  SYSTEMS 

If  we  consider,  for  example,  a  domain  of  runway  control  systems,  typical  objects  are: 
runways,  aircraft,  pilots  a''d  control  towers.  These  objects  are  related  to  each  other 
in  a  specific  way,  and  therefore  the  CSSR  requirements  and  counter  measures  dial 
will  be  implemented  will  be  the  same  for  all  systems  in  this  domain.  Only  the  d^ree 
of  importance  of  the  different  CSSR  requirements  and  counter  measures  will  vary 
from  system  to  system. 


4.2 


IHE  REQUIREMENTS-  AND  COUNIER  MEASURE  MATRIXES  AND  TEIE 


ROLE  OF  THE  CONSTRAINTS  ENG] 


I 


CAM  aiid  CAM/S  is  based  on  two  sets  of  matrixes,  me  set  for  the  requirements  and 
one  set  the  counter  measures.  Although  die  ddail  levd  of  the  user  requirements 
wdll  differ  between  CAM  and  CAM/S  the  basis  for  using  die  two  matrixes  will  be  the 
same.  A  set  of  matrixes  will  be  defined  for  CAM  and  a  different  set  of  matrixes  will 
be  defined  for  CAM/S. 


A  Constraints  Engineer  will  complete  these  sets  of  matrixes.  Fbr  the  purpose  of  diis 
article,  a  Constraints  Enginem’  is  a  computer  security,  safety  and  resilienoe  specialist 
wdK)  specializes  in  the  domain  of  the  required  ^stem.  The  paragrsqihs  below  describe 
the  role  of  the  Requirements  and  Counter  measure  matrixes  which  forms  the  basis  of 
CAM  and  CAM/S.  The  Cmstraints  Engineer  and  his  reqxmsibili^  in  setting  up 
these  sets  of  matrixes,  will  also  be  dscussed. 
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4.2.1  Requirements  matrix  sets 


During  development  of  die  CAM  (to  be  discussed  in  section  5)  and  die  CAM/S  (to 
be  discussed  in  secticm  €)  it  was  important  to  represent  die  CSSR  requirements  in 
rdadon  to: 

•  the  Junctional  requirements  of  dtt  system.  FOr  every  functional  requirement 
of  the  system  a  qiedfication  of  the  CSSR  requirements  for  that  functional 
requirement  must  be  done. 

•  the  "environment'’  that  the  system  will  operate  in.  The  Users  Qieople)  of  the 
system,  are  "part*  of  the  system  and  are  one  of  the  major  components  that  can 
jeopardise  or  be  jeopardised  by  the  system.  Therefore  it  is  important  to 
specify  the  CSSR  requirements  relating  to  these  people.  An  example  of  this 
requirement  can  be  to  implement  the  division  of  reqxmsibility. 

Every  form  of  technology  that  will  be  used  to  implement  the  system  brings  its 
own  land  of  risk.  Therefore  it  is  also  impcntant  to  look  at  the  CSSR 
requirements  in  relation  to  the  technology  to  be  used.  For  example 
implementing  a  system  under  the  DOS  operating  syston  introduces  a  diffnent 
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risk  as  opposed  to  a  ^stem  implemented  under  UNIX. 


To  lepeesent  the  rdationship  between  a  CSSR  leqmiement,  die  functional 
requirements  of  die  system  and  die  environment  of  die  qrstem  a  diree  dimensional 
matrix  (Requirements  Matrix)  is  used.  This  matrix  indicates  the  sub-elements  of  the 
specific  CSSR  requirement  on  die  x-axis,  die  functional  requirements  of  the  system 
on  die  y-axis  and  the  "environment”  on  the  z-axis.  [4]  also  rqnesents  die  environment 
in  a  matrix  format 

The  sub-dements  of  the  CSSR  •  requirements  are: 

•  Campuser  Security  requirement: 

•  Confidentiality 

•  Availabili^ 

•  Integrity 

•  Assurance 

•  Scfety 

•  Reliability 

•  Danger  of  being  out  of  orcter 

•  RedUence  requirement 

•  Availability 
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Reliability 

Assurance 


The  Requirements  Matrix  (figure  2a)  for  ounputer  security,  foe  CS-Requirements 

Matrix,  can  be  rqriesented  as  a  three  dimensional  matrix  with: 

•  x-axis  representing  the  sub-elements  of  computer  security  (conridentiality, 
availability,  inb^ty  and  assurance).  These  dimensims  cover  foe  whole 
spectrum  of  computer  security. 

•  y-wds  representing  the  high  level  fonctitmal  requirements  of  foe  required 
system.  These  requirements  can  be  determined  during  sessions  with  the  users 
or  can  be  domain  knowledge  and  can  be  qmfied  in  different  detailed  levds. 

•  z-axis  rqrresenting  foe  relevant  components  of  foe  computer  system,  (foe 
"environment”  as  discussed  on  a  jnevious  page). 

Requirement  matrixes  must  also  be  defined  for  Safety,  foe  S-Requirements  matrix, 

and  Resilience  the  R-Requirements  matrix,  as  indicated  in  foe  figure  below. 

(Fig  2b  &  c) 
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Figure  2  •  Requirements  Matrixes 
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4.2.2  The  Constraints  Engineer 


The  methodolc^,  CAM  and  CAM/S  as  desoibed  later  in  this  paper,  {novide  tools 
to  the  Qmstiaints  Engineer,  that  can  be  used  to  determine  and  represent  the  computer 
security,  safety  and  resilience  lequiiemoits. 

The  Constraints  engineer  creates  the  framework  of  a  requirements  matrix  by  assigning 
die  x,y  and  z-axes  to  each  of  the  three  matrixes.  The  initial  matrixes  will  create  an 
overview  of  the  damework  which  can  be  specified  in  more  ddail  in  CAM/S.  This 
framework  of  the  requirements  matrixes  can  be  used  to  structure  questions  used  to 
determine  CSSR  requirements  from  the  users. 

The  importance  of  the  CSSR  requiremoits  can  be  rated  by  the  Constraints  Engineer 
on  a  scale  from  0  to  3.  A  rating  will  be  done  for  each  cell  of  a  three  dimensicmal 
requirements  matrix.  Possible  ratings  allocated  by  the  Constraints  engineer  are  shown 
in  Table  1. 
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Rating 


Table  1  •  Ratings  ailocated  by  the  Constraints  Engineer 

After  completion  of  a  specific  matrix,  these  "ratings*  can  be  rqnesented  and 
manipulated  on  a  spreadsheet  as  discussed  in  paragnq>h  5.2.3. 

Although  completion  of  the  requirements  matrixes  will  be  time  consuming  it  is  similar 
to  some  of  the  processes  used  in  some  risk  analysis  packages.  This  stq>  by  stq> 
determination  of  the  requirements  will  guarantee  a  complete  set  of  CSSR 
requirements. 
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4.2.3  Tlie  Counter-measure  matrixes 


A  ^main  of  systmns  introduces  different  objects  dial  are  related  to  each  other  in  a 
q)ecif[c  way,  and  therefor  the  counter  measures  for  the  risks  introduced  by  these 
systems  must  also  be  the  "same*  but  p(»sibly  of  different  intensity.  For  example  for 
a  runway  control  system,  the  pilots  will  always  issue  landing  requests.  The  security 
requirement  can  differ  dq)ending  on  the  organization  foat  will  use  the  system.  A 
military  environment  might  require  more  security  measures  than  a  cargo  carrier. 

The  ratings  on  the  requirements  matrix  done  by  the  constraints  engineer  can  be  used 
for  determining  a  set  of  necessary  counter  measures  for  a  specific  CSSR  requirement. 
The  Counter-measure  matrix  contains  the  counter  measures  for  a  specific  domain  of 
^sterns. 

These  counter-measures  matrixes  are  matrixes  which  are  built-up,  by  the  constraints 
engineer,  after  devdoping  numerous  systems.  A  counter-measure  matrix  will 
represent  tire  ratings  of  the  constraints  engineer,  as  in  Table  1,  in  relation  to: 

•  the  different  sub-dements  of  the  ^redfic  CSSR  requirement  For  example, 
the  different  sub-dements  for  the  computer  security  requirement,  are 
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ooDfidentiality,  availability,  integrity  and  assurance 

•  rile  environment  of  the  system  (See  paragnyA  4.2) 

Uring  rite  raring  of  the  ocmstraints  engineer,  the  counter  measures  applicable  to  the 
j^ieciric  CSSR  requirement,  can  be  determined. 

This  matrix  will  form  the  logical  link  betweoi  the  CSSR  requirements  and  the  counter 
measures  needed  to  address  the  CSSR  requirements. 

A  Counter-measure  mauix  is  rqiresented  as  a  three  dimensicmal  m^irix  with: 

•  the  x-axis  representing  the  rating  0-3 

•  the  y-axis  r^resenting  the  sub^ements  of  the  qieciric  CSSR  requirement. 
For  example,  for  the  Computer  Security  Counter  measures  matrix  (CSCM- 
matiix)  the  y-axis  will  be  confidentiality,  availability,  integrity  and  assurance. 
The  same  format  holds  for  the  Safety  Counter-measures  matrix  (SCM-matrix), 
and  the  Resilience  Counter-measure  matrix  (RCM-matiix). 
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•  the  z-axis  giving  the  relevant  components  of  the  computer  system  (the 
environment). 

Each  cell  (x,y,z)  of  the  matrix  will  contain  the  counter  measures  applicable  for 
a  domain  of  systems. 

Figure  3  below  indicates  the  CSCM,  SCM  and  RCM  matrixes. 
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Figure  3  -  Counter  measure  matrixes 
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It  should  be  dear  to  the  reader  diat  diese  Counter-measure  matrixes  axe  created 
through  mq>eiieaoe  and  eiqpertise  in  die  rdevant  areas. 

CONSTRAINTS  ACQUISITION  METHODOLOGY  (CAM) 

Having  now  discussed  the  CSSR  Requirements-matiixes,  and  the  CSSR  Counter¬ 
measure  matrixes,  we  now  show  how  the  Constraints  Engineer  uses  these  matrixes 
to  build  CSSR  into  the  functional  system  requirements. 

The  didtation  stq>  will  start  of  with  an  initial  "discussion”  between  the  User/s,  the 
Requirements  Engineer  and  the  Constraints  Engineer.  Hie  rest  of  this  article  will 
concentrate  on  the  role  of  the  Constraints  Engineer. 

The  elicitation  step  usually  takes  the  form  of  a  "brain-storming  session",  whose  goal 
is  achieving  a  consensus  among  a  group  of  users  about  ^diat  they  want.  During  the 
didtation  stqp,  the  requirements  engines  acts  as  a  facilitator.[3]  During  this  phase 
the  constraints  engineer  will: 
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Qas^  the  system  or  part  thereof,  according  to  die  initial  specified  functional 
xequiremoit  in  a  domain.  The  following  are  examples  of  domains: 

•  safi^  critical  systems 

•  process  control  systnns 

•  on-line,  networidng  systems 

•  batch  systems 

5.2  Cornice  the  CSSR  Requirement  matrixes 

5.2.1  Construct  the  framework  ("lay-out”)  of  the  CSSR  requirement  matrixes  by 
determining: 

•  the  classes  of  users 

•  the  relevant  components  of  the  compute’  system  that  will  be  tqiplicable  to  the 
requirements  (the  y  and  z  axis)  (See  section  4.2.1) 

5.2.2  Populate  each  of  the  3  requirement  matrixes  by  rating  the  need  for  the  specific  CSSR 
requirement  in  relation  to  the  functional  requirement  and  the  relevant  components  of 
die  computm*  system,  for  each  of  the  3  CSSR-requirements,  i.e.  Computer  security. 
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Safety  and  Resilience.  Ratings  are  d^ermined  as  out  in  Table  1. 


Determining  the  specific  rating  can  be  done  by  restating  eadi  requirement  as  a 
question.  For  example,  what  is  the  requirement  for  confidentiality  by  fimctional 
requirement  1  used  for -user  1?  The  answer  to  this  questimi  can  be  answered  by: 

•  the  user 

•  existing  policies 

•  domain  knowledge,  for  example  it  is  accq)ted  that  a  pin  code  should  be 
included  in  an  auto  tdler  request 

The  answer  to  this  question  can  be  rated  by  the  requirements  engineer  on  a  scale  0 
to  3.  0  meaning  no  requirement  and  3  meaning  an  important  requirement.  If  the 
answers  can  be  answered  by  more  than  one  source,  an  average  rating  can  be  used. 
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Eacli  CSSR  requirements  matrix  must  be  oom|deled  for  each  of  the  functional 
requirements  (y-axis),  and  each  of  the  elements  in  the  environment  (z-axis). 


5;23  Represent  the  resultant  informatidn  on  a  ^readsheet. 

Having  determined  the  consent  of  the  CSSR-matiixes  during  the  elidtatim  jriiase,  it 
is  now  used  during  the  formalizatitm  phase. 

During  the  formalization  step  of  requirements  acquisition  the  Constraints  engineer. 

5.3  Rqnesents  these  ratings  on  the  initial  high  level  functional  flow  diagram  or  omtext 
diagram  if  sq^licable. 

5.4  Map  foe  requirements  matrixes  to  foe  counter  measure  matrixes  for  this  domain  of 
:^stems. 

5.5  From  these  matrixes  foe  counts  measures  for  foe  CSSR  requiremoits  can  be  listed 
per  requirement.  Cross  references  to  counter  measures  can  be  eliminated  at  this 
stage. 
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At  this  stage  the  counter  measures  necessary  to  achieve  tfie  required  conqwter 
security,  safety  and  resilience  in  the  newly  planned  system,  had  been  detennined,  and 
can  now  be  "designed  into*  the  system  right  from  die  b^inning. 

Determining  the  CSSR  requirements  using  the  Constraints  Acquisition  M^odolqgy 
described  above,  had  now  been  done  or  a  diorough  stq>-by-stq>  way,  and  not  by  any 
ad  hoc  guesses. 


6  CONSTRAI^^SACX2UlSmONME1BODOU>GYFORSOFTWARE(CAM/S) 

CAM/S  extend  CAM  to  detennine  and  specify  die  CSSR  requirements  during  die 
software  (tevelc^ment  life  cyde.  CAM/S  use  some  of  the  deliverdiles  of  CAM  and 
produce  the  table  of  counter  measures  for  the  requirements  to  be  implemented  in 
software.  A  amilar  methodology  CAM/H  can  be  devdoped  that  will  address  die 
requirements  to  be  implemented  in  hardware. 

6.1  INPUTS  AND  MAJOR  DELIVERABLES  OF  CAM/S 


The  following  figure  gives  an  indication  of  the  relationship  between  CAM  and 
CAM/S. 
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Figure  4  -  Inputs  and  deliverables  of  CAM/S 
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6.2 


CAM/S  METHODOLOGY 


6.2.1  CoDstruct  the  set  of  requirements  matrixes  for  software.  Using  6ie  set  of 
leauiiements  matrixes  fiom  CAM,  identify  only  tiiose  requirements  for 
implementation  in  software.  The  following  figure  (figure  S)  shows  tiie  requirements 
matrix  for  conq>uter  security.  In  titis  case  requirements  2,  7,  ...  n  were  those 
specifically  rdevant  to  Software  implementation.  Similar  matrixes  must  be 
constructed  for  safety  and  resilience. 
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Figure  5  >  Requir^ents  matrix  for  computer  security 
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6.2.2  Complete  the  latings  of  die  requirements  matrix  on  the  qneadsheet  An  example  is 
ffvea  in  Fig.  6  of  the  requirements  matrix  spreadsheet  for  omuputer  security. 
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Figure  6  •  Spreadsheet  of  the  Requironents  matrix 
for  Ccnnputer  Security 
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6.2.3  Drtenniiie  the  detail  requirements  and  construct  the  new  set  of  Requirements 
matrixes.  An  example  of  the  requirements  matrix  for  computer  security  is  indicated 
in  Fig  7  below.  This  example  show  that  requirement  2  was  expanded  into 
requirement  2.1  -  2.3.  These  detailed  requirements  is  indicated  on  the  y-axis  of  the 
requirements  matrix. 
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Ugare  7  -  Reqiiiremeots  matrix  (Computer  security)  for 
function  to  be  Implemented  in  software. 
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To  populate  each  of  the  requirements  matrixes  make  use  of  the  process  as 
described  in  paragT:q>h  S.2.2. 

Rq>resent  the  resultant  information  on  the  spreadsheet.  Indicate  the 
information  from  the  Requirements  matrix,  from  CAM. 


Figure  8  -  Spreadsheet  for  the  requirements 
matrix  (computer  security)  for  software  requirements. 
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6.2.4  Rq>resent  this  ratings  on  a  high  level  flow  diagram. 

6.2.5  Map  these  ratings  of  the  requirements  matrix  to  the  counter  measure  matrix.  This 
counter  measure  matrix  will  only  omtain  the  counter  measures  sq^licable  to 
requirements  to  be  implemented  in  software. 
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6.2.6  List  the  counts  measures  per  requirement 
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Abstract.  Distributed  computer-based  systems  are 
complex  due  to  their  size,  heterogeneous  nature,  and  the 
dynamic  interdependency  of  their  components. 
Hardware  and  software  are  usually  developed  by  a 
number  of  companies  which  the  prime  contractor  and 
customer  must  rigorously  monitor  and  control. 
Engineers  and  managers  ne^  traceability  for  control. 
They  must  trace  requirements  and  design  decision 
dependencies  to  create  a  complete  and  consistent 
design,  to  understand  the  impact  of  change,  and  to 
perform  re-engineering  without  introducing  errors. 
Automatic  compilation  of  software  and  silicon  may 
eventually  eliminate  the  need  for  traceability  between 
formal  specification  and  end-product,  but  traceability 
will  still  be  needed  for  tracing  textual  requirements  and 
design  decisions  to  formal  specifications  and  test  cases. 
This  paper  discusses  the  need  for  traceability,  current 
practice  and  feasibility. 

INTRODUCTION 

Managers  need  traceability  to  check  status  and 
completeness,  and  engineers  need  traces  to  develop, 
test,  and  change  the  system.  Unfortunately,  there  is 
little  software  engineering  research  on  traceability 
(Finkelstein  91).  Researchers  are  concenuating  their 
efforts  on  automatically  transforming  detailed  formal 
specifications  into  efficient  software  programs.  When 
changes  are  necessary,  engineers  will  change  the  formal 
specification  and  the  "specification  compiler”  will 
generate  new  software.  Automated  software  generation 
is  already  feasible  for  software  that  is  not  distributed  or 
high  performance.  If  hardware  becomes  sufficiently 
efficient,  specification  compilers  will  become  a  reality. 
At  that  time,  traceability  from  formal  software 
specifications  to  code  will  no  longer  be  necessary. 
However,  traceability  will  still  be  needed  from  hi^ 
level  goals,  policies,  and  textual  requirements  to  the 
fonnal  specification  and  test  cases. 

The  system  engineer's  need  for  traceability  cannot 
be  resolved  by  automated  code  generation.  System 
design  and  validation  require  tracing  from  high  level' 
mission  goals  to  policies  for  achieving  those  goals,  and 
to  system  requirements,  design  concepts,  tradeoffs, 
decisions,  and  rationale.  Higher  level  system 
requirements  must  be  allocated  and  traced  to 
component  requirements  and  traced  to  test  cases  to 
make  sure  the  system  requirements  ate  met. 


Dorfman  estimates  that  there  are  2S0  requirements 
in  the  hierarchy  for  every  system  level  requirement 
(Dorfman  91).  This  creates  a  vast  network  of  inter¬ 
related  information.  Traceability  of  all  requirements 
and  develt^Mnent  infortnadon  is  prohibitive.  It  may  be 
unnecessary  or  undesirable  (considering  overhead)  to 
maintain  linkages  between  less  significant  or  non- 
critical  requirements  and  every  ouqiut  related  to  them 
QIamesh  et  al.  92). 

Nevertheless,  we  should  apply  traceability  with  as 
much  rigor  as  possible.  One  of  the  reasons  for  tracing 
information  is  to  sui^rt  re-design  when  requirements 
are  modified,  which  can  happen  during  or  after 
development  It  is  difficult  to  pr^ct  in  advance  what 
may  change;  even  non-critical  requirements  may  be 
modified.  Without  traceability,  requirements  changes 
may  not  be  communicated  to  a  group  affected  by  the 
requirement  flowdown.  Our  experience  is  consistent 
with  Dorfman's:  "more  of  the  requirements  problems 
observed  in  system  development  are  due  to  failures  in 
requirements  management  than  in  the  technical 
functions”  (Dorfman  91). 

To  support  impact  assessment  and  redesign,  we  need 
more  than  a  connection  between  document  paragraphs, 
and  more  than  a  link  between  a  requirement  and  a 
designed  function  or  test.  We  must  be  able  to  trace 
threads  of  behavior  from  an  Operational  Concept 
Document  to  detailed  threads  in  a  system  behavior^ 
model  We  must  trace  subsystem  design  decisions  to  be 
sure  they  are  not  in  conflict  with  system  design 
decisions,  track  that  fault  tolerant  and  other  non¬ 
functional  requirements  are  met,  and  validate  that  an 
entire  requirement  is  satisfied,  when  the  requirement 
traces  to  mote  than  one  entity. 

TRACING  BEHAVIOR 

An  Operational  Concept  Document  (OCD)  describes 
bow  the  system  will  be  used.  It  includes  a  description 
of  environment  actions  (qierator  or  other  system)  and 
corresponding  system  actions.  System  behavioral 
requirements,  wlUch  should  be  trac^  from  the  OCD, 
ate  usually  specified  using  stimulus-response  threads 
that  cross  fuiiction  boundaries.  Unfortunately,  current 
traceability  techniques  are  limited  for  tracing  behavior 
(DoD  Software  Technology  Strategy  91). 

Behavior  is  difficult  to  trace  using  current  methods 
because  current  trace  mechanisms  link  functions  rather 
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than  threads.  Methods  that  link  functions  associate 
stimulus-response  requirements  with  signiftcaniiy  large 
system  subsets  or  the  entire  system. 

Contractors  should  be  tracing  behavioral  threads  to 
more  detailed  behavior,  and  associating  this  behavior 
with  test  cases  that  drive  simulations,  prototypes,  and 
implementations. 

TRACING  PRODUCT  ATTRIBUTES 

Produa  attributes,  are  diflicult  to  trace  due  to  their 
pervasive  nature.  We  discuss  the  difficulties  for  several 
of  these  attributes. 

Tuning  has  numeric  constraints  and  is  associated 
with  normal  and  exceptional  behavior  paths  that  link 
input  to  output.  Timing  can  be  traced  if  we  solve  the 
b^vior  tracing  problem.  The  problem  is  that 
stimulus-response  paths  include  operating  system  calls, 
database  accesses,  network  interaction,  and  man- 
machine  interface  functions,  as  well  as  application 
functions.  Timing  is  also  dependent  on  resource 
utilization,  so  degraded  conditions  must  be  considered, 
vastly  increasing  the  number  of  links. 

Efficiency  is  a  function  of  required  processing  and 
resource  utilization.  Reliability  is  mean  time  to  failure 
in  operational  use  and  is  related  to  failure  during  test 
Availability  is  related  to  reliability  and  mean  time  to 
repair.  These  attributes  are  frequently  associated  with 
system  components,  but  inter-component  interfaces  can 
affect  the  attribute. 

Safe,  survivable,  and  secure  are  rated  by  criticality 
level.  Designers  try  to  partition  the  system  so  that 
‘highly  critical"  applies  to  a  small  part  of  the  system, 
but  this  is  difficult  as  we  do  not  know  how  to  create 
impenetrable  boundaries.  Verifying  that  attributes  are 
met  is  equivalent  to  proving  that  bad  things  cannot 
happen,  which  is  difficult  or  impossible.  Therefore, 
strict  design  and  coding  principles  must  be  followed, 
and  we  must  trace  flowdown  and  verification  so  that 
verifications  can  be  repeated  if  a  change  occurs. 

Fault  tolerance  is  related  to  a  pervasive  philosophy 
(e.g.,  fail-operational/fail-safe,  hot  standby,  compart- 
mentalization)  and  to  hardware  and  software  design 
decisions.  Tracing  whether  the  fail-operadonal/fail-safe 
philosophy  has  been  met  is  equivalent  to  inspecting  all 
design  and  implementation.  It  is  difficult  to  test  a 
system  for  fault  tolerance,  as  we  cannot  introduce  every 
1^  of  fault  Every  test  that  results  in  a  failure  should 
be  analyzed  for  additional  fault  tolerance  requirements, 
causing  iterative  test  requirements,  design,  code,  test 
cycles,  and  iterative  traces. 

User-friendly  is  considered  a  characteristic  of  man- 
machine  interfaces,  including  menus,  screens,  and  on¬ 
line  help.  It  means  the  system  is  easy  to  learn,  easy  to 
use,  and  pleasant  to  use.  Processing  and  com¬ 
munication  delays,  subsystem  data  availability  and 
diagnostic  data  availability  affect  user-friendliness 
during  operation  and  training.  Delays  and  data 


availability  involve  many  system  parts,  creating 
traceability  problems. 

The  pervasive  nature  of  most  product  attributes 
indicates  that  it  is  not  feasible  to  uace  these  attributes  to 
the  lowest  level.  To  understand  what  must  be  traced, 
we  need  a  defined  process  for  specifying  and  verifying 
product  attributes. 

CONCLUSIONS 

When  contractors  say  they  perform  requirements 
management,  they  generally  trace  document 
paragraphs,  functions,  test  cases,  and  resource 
utilization.  This  level  of  requirements  management  is 
minimal,  compared  to  what  is  needed.  Traceability 
should  be  part  of  an  overall  process  for  require¬ 
ments/design  flowdown  and  verification.  This  process 
must  indicate  what  should  be  defined,  traced,  inspected, 
modeled,  tested,  and  analyzed. 
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Distributed  Design  of  Computer-Based  Systems:  Needed 

Academic  Programs 

Dr.  Julian  Iloltzman 
CECASE/University  of  Kansas 


Abstract  This  fMiper  summarizes  the  portion  of 
the  panel  session  on  Distributed  Design  of  Computer 
Bas^  Systems  devoted  to  a  discussion  of  academic 
programs.  Some  of  the  key  elements  identified  by  the 
CBSE  Task  Force  for  Distributed  Design  of  Computer 
Based  Systems  in  both  research  and  practice  are  briefly 
reviewed.  Academic  programs  and  some  comments  on 
supporting  research  follow.  The  basic  conclusion  is 
that  current  leaching  programs  are  inadequate  in  content 
and  emphasis  on  the  Systems  Engineering  aspects  and 
that  universiiy-ba.sed  research,  in  general,  lacks  a  focus 
on  solving  problems  of  significance  to  practitioners  and 
rarely  is  'scalcable'  to  real-world  situations. 

Introduction 

The  IEEE  Task  Force  on  Computer  Based  Systems 
Engineering  has  been  meeting  in  workshop  formats  for 
the  past  two-and-a-half  years.  Many  key  issues 
pertinent  to  the  System  Engineering  of  Computer  Based 
Systems  are  being  identified  and  becoming  the  subject 
of  active  working  groups.  Presently  there  arc  six 
working  groups,  including  Process.  Models.  Tools. 
Education.  Case  Studies,  and  Re.scarch. 

The  Education  Working  group  has  been  reviewing 
existing  programs,  with  special  emphasis  on  the  issues 
and  results  arising  from  the  other  technical  groups.  The 
consensus  has  been  that  few  academic  programs 
provide  the  background  for  their  graduates  to  perform 
the  functions  of  ^SE  and  that,  in  general,  university- 
based  research  does  not  meet  industry  needs. 

Review  of  Some  Key  COSE  Issues 

In  a  paper  to  be  published  shortly  in  the  IEEE 
Computer  Journal,  the  panel  members  (and  some  othen 
from  the  CBSE  group)  have  stated  the  following. 

The  nature  of  CBSs  requires  a  differeni  systems 
engineering  knowledge  base  than  that  normally 
required  to  engineer  non-CBSs.  AH  CBSs  involve 
ap^ication  software  and  associated  services  that  are 
conceptual  in  nature  and  inherently  difficult  to  grasp. 
Requiremenu  satisfied  by  software  are  frequently 
ambiguous  and  subject  to  change.  This  leads  to  design 
changes  that  may  sacrifice  system  architecture 
flexibility  to  ensure  performance  requirements  are 
met.  Furthermore,  software  changes  in  complex  CBSs 
can  resuk  in  unpredictable  behavior,  both  intemal  and 
external  to  the  CTS.  The  nature  of  distributed  CBS(s) 
is  unique  in  that  CBS  resources  are  frequently 
geographtcally  dispersed  and  under  the  control  of 


Afferent  organizations.  To  exchange  data  among  such 
systems  requires  interfaces  to  des^be  content,  and 
protocols  to  describe  formal. 

The  paper  proceeds  to  identify  critical  issues  and 
reports  on  the  current  state  of  the  practice.  For 
purposes  of  this  panel,  we  summarize  below  some  of 
key  issues  pertinent  lo  academic  programs. 

CBSE  IS  concerned  with  the  following 
responsibilities  in  addition  to  those  of  traditional 
Systems  Engineering; 

*  Design  decision*  concerning  the  distributed 
nature  of  the  CBS  (its  architecture). 

*  Allocation  of  resources  to  component  devel- 
opers  and  management  of  the  coordinate  process. 

-  Allocation  of  functions  and  data  to  CBS 
resources  (processors,  software,  datasiores.  displays. 
Human  Computer  Interface). 

*  CBS  strategies  with  respect  to  safety,  security, 
and  fault  tolerance. 

*  Global  system  management  strategy. 

*  Performance  allocations  (liming,  sizing, 
availability). 

*  Testing  (component,  integration,  interoper¬ 
ability  with  the  estemsi  environment). 

*  Logistics  support  (mainienance.  training). 

*  Implementation  of  the  CBS  within  the  exittting 
environment  (e.g..  bandwidth,  memory  size.  I/O 
subsystem,  database  system),  system  environment 
constraints  (e.g..  operational  environment,  security 
measures),  and  performance  thresholds  (e.g., 
timeliness,  throughput  availability);  that  is.  the  more 
traditional  Systems  Engmeering  issues. 

Considerations  from  an  Academic  Perspective 

A  general  observation  at  this  point  is  that  CBSE 
education  must  be  grounded  in  both  Computer 
Engineering  and  Computer  Science  but  with  a  strong 
emphasis  on  System  Engineering  issues.  Review  of 
some  key  academic  programs  in  Computer  Engineering 
in  North  America  revealed  that  most  of  the  programs 
reflected  the  underpinnings  of  the  normal  Electrical 
Engineering  programs:  e.g.  communications,  VLSI 
design,  and  so  forth.  Only  in  die  last  few  years  have 
Computer  Engineering  programs  begun  to  take  on  their 
own  identity. 

Computer  Science  programs  also  reflect  the 
emphasis  of  their  origin  rathM  than  the  needs  of  the 
practitioners.  Most  are  based  on  abstract  mathematical 
theory  and  stress  provability  more  than  on  what  will 
work,  and  seem  driven  by  the  desire  to  produce  people 
to  write  compilcfs. 
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Other  CBSE  related  programs  are  extensions  of 
Industrial  Engineering.  Operations  Research,  and  so 
forth.  Interestingly,  these  programs  address  some  key 
systems  engineering  issues  but  lack  the  depth  in  the 
background  discipline  areas  needed  for  CBSE. 

Undergraduate  programs  commonly  arc  greatly 
lacking  in  a  systems  engineering  emphasis,  or  even 
flavor.  The  traditional  departments,  over  the  last  twenty 
or  thirty  years,  have  erected  almost  insurmountable 
barricra  between  themselves  at  the  same  time  these 
baniets  are  breaking  down  in  industry.  Specialties  are 
dropped  in  one  department  and  picked  up  by  another 
(e.g.,  control  theory  from  EE  to  Aerospace)  rather  than 
being  shared  by  interdisciplinary  groups  of  interested 
parties.  Nonetheless,  it  is  important  to  address  the 
complex  system  engineering  issues  in  all  disciplines: 
and  it  may  require  courses  labeled  'Systems 
Engineering  for  XX'  to  meet  the  current  academic 
cultural  environmenL 

Most  undergraduate  programs  concentrate  on  what 
can  be  termed  grading  by  artifact.  That  is.  very  Utile 
attention  is  given  the  method  used  in  obtaining  a 
solution;  rather,  we  grade  the  specific  answer.  Thus  it 
should  not  be  a  surprise  (hat  graduates  of  such  programs 
do  not  understand  the  process  of  problem  solving  but 
are  specialists  in  very  narrow  particular  solution 
meih<^.  Moreover,  little  attention  is  given  to  seeking 
and  evaluating  alternate  solutions.  Optimisation  is 
rarely  examined,  and  certainly  even  more  rarely  is 
system  optimisation  considered.  Of  special  concern  to 
a  CBSE  program  is  the  concentration  on  multiple  small 
problems  that  are  presented  to  the  students.  They  do 
not  usually  have  many,  or  good,  experiences  in  large, 
team-oriented  projects-projects  that  would  be 
representative  of  a  CBS. 

There  is  little  exposure  to.  and  experience  in,  micro¬ 
economics.  Instead,  a  single  course  in  macroeconomics 
is  included  in  most  programs.  It  is  little  wonder,  then, 
that  engineers  and  systems  engineers  find  themselves 
the  'victims’  of  cost  effectiveness  analyses  rather  than 
'«>eing  in  charge  of  them. 

Traditional  university  research  is  focused  on  publi¬ 
cations  in  refereed  joumals-many  of  them,  and  rapidly. 
Since  both  the  tenure  and  promotion  considerations 
hinge  on  publications,  concentration  on  industrial 
strength  and  industry-pertinent  research  is  not  present 
Typically,  a  research  problem  is  selected  that  will  yield 
publishable  results  without  much  concern  as  to  the 
realism  of  the  simplifying  assumptions.  Results  of  the 
research  are  rarely  tested  against  real  benchmarks.  As 
in  the  design  of  courses,  the  problems  selected  are 
nsually  of  small  size.  Scaling  to  commercial-sized 
proUems  is  almost  never  done. 

Recommendations 

A  fundamental  issue  the  CBSE  Education  Working 
Group  is  debating  is  (he  level;  i.e.,  undergraduate  versus 


master.  There  are  proponents  of  an  undergraduate 
program,  in  spile  of  the  reservation  that  it  is  highly 
unlikely  that  an  individual  four  or  five  years  out  of  high 
schoal  will  perform  systems  engineering  work  even  at 
the  detail  design  level.  A  consensus  seems  to  be 
developing  among  the  working  group  members  that  a 
hybrid  undergraduate  program  concentrating  on 
discipline  funrtamenDls  with  a  strong  emphasis  on  true 
design  principles,  addressing  many  of  the  issues  raised 
above,  is  the  desired  course  of  action.  A  very  early 
draft  of  a  model  program  will  be  given  during  the  panel 
discussion. 

There  is  general  agreement  between  NCOSE 
members  and  CBSE  Task  Force  members  that  an 
identifiable  Systems  Engineering  program  at  the 
graduate  level  is  a  necessity.  Both  organizations  have 
working  groups  addressing  this  problem.  During  the 
panel  session;  we  will  present  a  preliminary  draft  of  a 
Master's  level  program  for  CBSE  with  two  tracks:  one 
predominantly  technical  with  a  strong  emphasis  on 
formal  methods,  and  the  second  wivh  a  strong  flavor  of 
management.  The  reality  of  program  duration 
constraints  suggests  two  tracks,  even  though  a  single 
track  with  element  of  each  would  be  more  desirable. 

There  are  several  extremely  serious  issues  that  must 
be  addressed  beyond  the  technical  content  of  academic 
programs.  The  research  content  and  nature  of  graduate 
programs  requires  definition  and  support.  The  source 
of  faculty  with  the  appropriate  CBSE  expertise  is  of 
concern.  It  is  most  likely  that  participation  by  industry 
will  be  required.  Some  of  the  cultural  aspects  have 
been  discussed  above.  A  publication  from  the  National 
Research  Council  outlining  the  need  for  a  resurgence  in 
both  research  and  teaching  in  design  observed  that 
universities  tend  to,  first,  deny  there  is  a  problem  and, 
second,  state  that  it  is  impossible  to  change.  NCOSE 
and  (he  Task  Force,  in  partnership  with  academia,  have 
a  significant  effort  in  front  of  them. 
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Abstract  The  distributed  design  of  computer  based 
systems  requires  a  methodology  that  supports  both  the 
modeling  of  the  distributed  system  and  the  ability  to  as¬ 
sess  the  impact  of  requirements  and  engineering  change 
while  the  system  is  at  any  stage  of  development.  Model 
Based  Systems  Engineering,  MBSE.  provides  an  ap¬ 
proach  to  these  twined  problems  with  tte  advantages  of 
tailorability  to  application,  culture,  notations,  and  tools. 
It  is  critical  that  traceability  linkages  be  constructed  in 
the  normal  course  of  engineering  development  and  that 
they  be  adequate  for  an  amlysis  of  the  impact  of  change. 

Modeling  of  Distributed  Systems 

The  systems  modeling  of  large  distributed  systems  is 
done  in  tiers,  beginning  at  context  level  and  continuing 
to  a  separation  tier  where  hardware,  software,  the  tasks 
of  people,  and  facilities  can  be  separated  and  specified 
independently.  The  systems  modeling  must  assure  that 
the  implementations  from  all  these  disciplines  will  work 


logger  up<H)  integration.  This  requires  the  careful  mod¬ 
eling  of  threads  through  the  system  as  shown  in  Figure  1. 

The  threads  can  be  threads  of  control  which  may  or 
may  not  involve  people  in  the  loop.  The  threads  may  be 
stimulus  response  threads  which  may  or  may  not  involve 
people  in  the  thread.  These  threads  pass  through  many 
layers  and  components  of  the  system  and  it  is  the  system 
responsiveness  that  is  critical.  There  may  be  seve^  in¬ 
dependent  threads  which  can  occur  randomly  in  time  and 
can  interfere  under  special  occurrence  conditions.  It  is 
critical  that  such  situations  be  modeled  and  that  simulta¬ 
neous  need  for  the  same  resource  be  prevented.  The  use 
of  bus  structures,  complex  queues,  and  distributed  data 
stOTage  must  be  carefully  evaluated. 

Thread  analysis  begins  at  context  level  by  ihorou^ly 
modeling  the  behavior  of  the  external  components  or  en¬ 
tities  in  the  environment.  A  methodology,  MBSE,  has 
been  described  for  accomplishing  this,  (Oliver  1993a), 
(Oliver  1993b),  ( Oliver  1993c),  in  a  manner  that  is  tai- 
lorable  to  different  representations  of  the  information. 


Figure  1.  Critical  threads  through  a  system  which  has  been  fully  decomposed 
to  separate  software,  hardware,  and  operator  components 
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different  notations  and  diff^erent  tools.  The  seven  core  en- 
gineering  steps  and  their  ^plication  to  tiers  of  decompo¬ 
sition  is  shown  in  Figure  2. 

These  models  capture  behavior,  what  a  component 
does,  and  are  called  Conceptual  or  What  Models.  The 
models  are  executable.  They  link,  for  traceability,  to  the 
Operations  Concept  documents.  As  the  behavior  of  the 
system  is  modeled  in  a  hierarchy  of  increasing  detail. 
What  Mottels  in  the  same  executable  notation  are  devel¬ 
oped  at  each  tier.  The  external  behaviors  are  linked  to  the 
system  behavior  and  executed  together.  This  process 


drives  threads  and  time  lines  through  the  entire  system  to 
assess  resource  conflicts. 

The  thread  analysis  also  supports  the  development  of 
the  Sequential  Build  and  Test  Plan,  step  7.  Issues  can  be 
raised  at  any  point  of  development  and  the  issue  descrip¬ 
tions  link  to  the  associated  entities  in  the  models.  Reso¬ 
lution  of  the  issues  is  recorded  and  linked  similarly.  In¬ 
herent  in  the  method  are  trade-off  and  optimization.  The 
behavior  of  all  components  must  be  modeled  Data  store 
notation  is  ncA  adequate  Requests  to  operating  systems, 
busses,  and  DBMS’s  must  be  modeled. 

Information  models.  How  Models,  have  been  de¬ 
scribed,  (Oliver  1993a),  for  each  of  the  seven  core  engi¬ 
neering  steps.  They  show  the  relationships  among  all  the 
entities  in  the  models.  Under  the  sponsorship  of  the  IF.RF. 
Task  Force  on  Computer  Based  Systems  Engineering  the 
cote  steps  and  their  information  models  are  being  scruti¬ 
nized  by  Systems  Engineering  groups  around  the  world 
for  correctness  and  completeness. 
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This  paper  uses  a  drawing  tool,  a  spread  sheet, 
and  functional  strings,  to  support  high  level 
ana^sis  of  a  conceptual  ^stem.  By  providing  the 
data  required  and  investigating  suspicious  areas, 
the  Systems  Engineer  can  minimize  the  number 
of  surprises  that  occur  later  in  the 
developmental  process.  The  same  spread  sheet 
information  can  be  built  on  to  assess  and  present 
the  cfiffererKe  in  cost  and  performance  compared 
to  a  baseline. 

Simple  mistakes  are  the  most  embarrassing. 

When  a  system’s  function  and  interfaces  are 
defined  in  a  static  structure,  it  is  often 
discovered  that  it  could  not  perform  the 
minimum  operations  dynamically.  When 
cSscovered  in  an  early  stage  of  the  design,  the 
problem  may  be  easy  to  correct,  but,  it  was  still 
embarrassing  to  get  caught  by  a  design  oversight. 

The  purpose  of  this  short  paper  is  to  offer  a 
method  to  define  and  analize  a  static  view  of  a 
functional  allocation.  This  is  done  by  looking  at 
estimates  of  the  best  case  and  worse  case 
dynamic  limits  of  the  allocated  system,  which 
allows  an  understanding  of  the  functional 
operation  within  a  given  set  of  system 
constraints.  Performing  this  analysis  in  a 
complex  system  requires  the  use  of  an 
accounting  system  that  is  as  simple  as  the  pencil 
but  has  the  power  of  the  computer.  A  spread 
sheet  program  fits  the  bill.  This  is  done  using  a 
top  level  equation: 

Dynamic.  Capability.  USED  = 

%/  /  0_used  -  per_  fimction  ±  tollerences 

+ 

^%CPU_used^-per_  function  ±  ^tollerences 

+ 

Rc5crve 

When  this  equation  and  the  spread  sheet  become 
ineffective,  move  the  analysis  to  a  data  base  or 
^sterns  engineering  design/analysis  tool. 

A  technique  used  to  identify  and  isolate  strings  of 
linked  computation  used  by  models  of  high 
performance  computer  systems  is  recommended 
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to  support  the  spread  sheet  model.  This  simple 
allocation  of  known  or  desired  information  put 
into  an  operatioriai  (time  based)  string  allows 
high  level  accounting  practices  to  te  used  to  keep 
the  books  on  a  percentage  capabHity  used  bases. 
These  strings  can  be  parallel  or  serial 
operations.  The  strings  can  have  bounded 
interactions  showing  communication  or 
constraints.  The  details  of  the  operations  can  be 
at  a  Ngh  level  (see  Figure  1 .),  and  the  dynamics 
need  only  include  best  and  worst  case 
performance  estimates.  Interactions  through 
physical  communication  channels  include  needed 
volume  and  rate  estimates  with  best  and  worst 
case  limits. 

The  view  of  a  simple  system  (Figure  1 )  can  be 
deceptive.  The  following  example  points  out  how 
easy  it  is  to  assemble  a  set  of  operations  and 
tolerances  and  match  them  with  implementation 
capability  and  constraints.  Then  one  carl  perform 
a  top  level  “does  it  fit?."  Analysis.  The  Model  is 
easy  to  implement  with  spread  sheet  tools  such 
as  Excel  or  Lotus. 


Defined  Operations 
and  Allocated  Structure 
Figure  1. 


How  should  the  model  work?  All  good  models  are 
assembled  to  answer  the  specific  questions  of  the 
operation  and  performance  within  the 
constraints  of  the  proposed  implementation.  This 
model  wnll  be  analyzed  for  bottle  necks  and 
performance  tolerances.  It  could  also  answer 
questions  about  allocation  of  maximum  or 
minimum  performance  or  many  other  numeric 
data,  such  as  errors  or  allocated  reliability. 
Once  die  model  is  created  (defined),  simple 
statistical  or  numerical  analysis  techniques 
built  into  spread  sheets  can  be  used  to  automate 
and  display  the  data.  A  graphical  representation 
with  the  spread  sheet's  numbers  helps  provide 
consistency  of  the  model  at  the  level  of  detail 
jncluded. 

The  following  example  uses  a  top  level 
description  of  a  process  to  perform  an 
operational  constraints  analysis.  The  system  is 
defined  by  Figure  1 .  The  flow  through  the  system 
uses  ab,  I,  K,  0,  gh,  (J,  ef  L,  M,  N,  &  P, 
communication  paths  through  9  processes  to 
finally  get  to  the  output. 

£y/uiiflre-d/-Dynamic_  Capability.  USED  = 

'^Input^med  *40%  ±  10% 

^Process_u5ed  *  60%  ±  20% 

^Oujput_used  *  70%  ±  20 

"RtqidredJRt  serve  *  10% 
Esumajed_Confidence_level  *95% 

This  model  has  more  than  81  potential  states  of 
operation.  The  allocated  processes  and 
communications  could  require  a  significant 
effort  to  completely  define  or  simulate.  Using 
ExceFs  r  function,  if  there  is  a  skew  between 
the  estimates  and  the  operational  data,  the 
margin  is  about  18%  and  is  still  safe. 

Figure  2.  This  model  depicts  a  niore  detailed 
description  of  the  system  with  normal  processes 
defined.  The  string  analogy  shows  processes  and 
how  they  are  allocated  resources  and 
communications  by  stringing  a  series  of 
functions  that  would  be  followed  in  the  systems 
normal  operation.  The  two  normal  strings  are: 
A-“IN-A-ab'B*K-F-P-X-Out";  and  B-  “IN- 
4-l-E-ef-F-P-X-0ut”.  Both  Normal  threads 
Nave  error  correcting  strings  “L-C-M-D-N- 
B-K-F"  and  “0-G-gh-h-Q-E”  that  add 
:omplexity  to  the  allocating  processes  to  nodes 
3-E  &  F.  The  effect  of  these  two  looped  back 
strings  can  be  analyzed  using  the  spread  sheet. 


Assuming  equal  distribution  between  A  &  B,  node 
F  is  loaded  to  60  %.  If  A  uses  its  secondary 
string  50%  of  the  time,  the  load  on  F  jumps  to 
90%  load.  If  the  loading  distributions  are  not 
normal  then  there  is  likely  an  intermittent 
problem  with  the  systems  operation.  Using  the 
statistical  package  within  the  spread  sheet 
allows  the  generation  of  a  complete  set  of  data  on 
this  potential  bottleneck. 


Defined  Operations 
and  Allocated  Structure 
Figure  2. 

String  analysis  uses  a  set  of  heuristics  to  group, 
classify  and  analyize  static  and  dynamic  state 
space  at  a  high  level.  The  technique  does  not 
require  assumptions  to  be  atomic  at  any  level. 
Lower  level  detailed  interactions  and  constraints 
are  included  in  the  top  system  description  only 
when  necessary.  The  analysis  moves  to  Petri 
Nets  or  complete  tools  such  as  ROD  or  State  Mate 
as  the  information  and  precision  increases. 
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Abstract.  This  paper  Introduces  a  method 
of  modeling  that  uses  spread  sheet  tools  to 
support  the  analysis  of  high  levoi  function 
and  performance  allocations  early  in  the 
design.  The  thrust  of  this  short  paper  is  in 
support  of  the  Computer  Based  Systems 
Engineering  Panel  discussion  of  the  need 
for  modeling  during  system  design. 
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In  this  panel  we  propose  to  consider  tradeoffs  between  system  seciui^  and  other 
critical  system  requirements,  where  by  system  security  we  mean  the  maintenance  of  data 
confidentiality  and  integrity  in  the  face  of  hostile  intrude  who  are  actively  trying  to  sub* 
vert  the  security  policy.  In  the  past,  security  was  usually  identified  with  confidentiality, 
and  it  was  usuaUy  considered  more  or  less  in  isolation  from  other  system  properties,  par¬ 
ticularly  in  the  DoD.  That  is,  the  main  function  of  a  secure  system  was  to  be  secure; 
other  properties  were  usually  secondary.  Such  an  approach  was  adequate  when  it  was 
applied  to  the  problem  of  protecting  data  in  stand-alone  operating  systems  that  had  no 
other  critical  requirements  and  were  not  part  of  any  larger  system.  But  now  we  are  see¬ 
ing  the  need  for  for  more  complex  systems  with  other  requirements  as  critical  as  security, 
including  availability,  reliability,  and  timing  constraints.  It  is  these  kinds  of  features  that 
are  often  nx)st  in  conflict  with  security  requirements.  Guaranteeing  secrecy  may  mean 
making  both  the  data  and  the  system  less  available.  Requiring  diat  the  system  not 
operate  in  certain  insecure  modes  may  make  it  less  reliable.  Hnally,  the  time  involved  in 
enforcing  secrecy  and  integrity  constraints  may  negatively  affect  performance,  both 
because  of  the  time  involved  in  checking  the  constraints,  and  because  the  restriction  of 
information  flow  to  secure  channels  may  prevent  infmmation  from  being  processed  in  the 
fastest  and  most  convenient  way. 

In  order  to  build  a  system  in  which  various  critical  properties  can  be  guaranteed  to 
an  acceptable  degree,  it  is  necessary  to  understand  the  tradeoffs  between  security  and  the 
other  properties.  In  this  panel  we  are  gathering  together  researchers  who  are  working  on 
computer  security  as  it  affects  various  other  possibly  critical  properties  of  a  system  We 
plan  to  consider  the  following  questions: 

[1]  What  effect  does  enforcing  security  requirements  have  on  a  system’s  ability  to  meet 
other  requirements,  such  as  reliability,  availability,  and  real-time  requirements?  In 
particular,  what  kinds  of  security  requirements  are  in  direct  conflict  with  other  sys¬ 
tems  requirements,  and  what  kinds  can  be  seen  as  working  together  with  other  sys¬ 
tems  requirements?  For  example,  in  what  ways  is  a  secure  system  more  or  less  reli¬ 
able,  and  in  what  ways  is  a  reliable  system  more  or  less  secure? 

[2]  Are  there  aspects  of  security  (e.g.,  covert  channel  rates)  tiiat  can  be  easily  quantified 
and  con^ared  with  other  quantifiable  requirements?  What  aspects  can  not  be  so 
easily  quantified?  What  can  we  done  to  make  them  more  quantifiable? 
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[3]  Are  Acre  techniques  from  security  that  can  be  helpful  in  assuring  that  a  system 
meets  its  other  requirements,  or  vice  vena? 

[4]  If  a  system  must  meet  other  requirements  such  as  dependability,  etc.,  are  diere  any 
kinds  of  security  requirements  that  it  is  more  likely  to  have?  What  information 
*&t>m  systems  development  is  most  likely  to  give  us  an  answer  to  this  question? 

[5]  Does  one  need  to  diink  of  security  in  a  different  kind  of  way  when  thinking  of  it  in 
conjunction  with  other  system  requirements?  In  particular,  does  mie  need  a  different 
way  of  developing  and  defining  security  policies? 

[6]  Properties  not  usually  associated  with  security  (e.g.,  timing  or  reliability)  may  also 
be  considered  security  properties  when  they  must  be  maintained  in  face  of  hostile 
attack.  How  does  this  change  the  way  we  think  about  them,  and  how  does  it  change 
the  way  we  think  about  security?  Finally,  how  does  thinking  of  these  properties  in 
this  way  affect  the  tradeoffs? 

r?]  How  compatible  with  other  system  objectives  are  the  assurance  techniques  tised  in 
computer  security?  How  are  assurance  problems  addressed  in  other,  related  com¬ 
munities? 

Proposed  Panelists: 

Catherine  Meadows,  NRL,  chair 
Marshall  Abrams,  MITRE 
Teresa  Lunt,  SRI 
Carl  Landwehr,  NRL 
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5113  LEESBURG  PK  STE  514 
FALLS  CHURCH  VA  22305- 
PHONE:  (703)845-0040 
FAX:  (703)845-0042 
EMAIL: 


MACK  ALFORD 
M/S: 

ASCENT  LOGIC  CORP 

180  ROSE  ORCHARD  WAY  STE  200 

SAN  JOSE  CA  95134- 

PHONE:  (408)943-0830 

FAX:  (408)943-0705 

EMAIL:  ALFORO@ALC.COM 


OSMAN  BALCI 

M/S:  DEPARTMENT  OF  CS 

VIRGINIA  TECH 

BLACKSBURG  VA  24061-0106 
PHONE:  (703)231-4841 
FAX:  (703)231-6075 
EMAIL:  BALCI@VTOPUS.CS.VT.EOU 


BRUCE  BLUM 
M/S: 

JHU/APL 

JOHNS  HOPKINS  RD 

LAUREL  MO  20723-6099 

PHONE:  (301)953-6235 

FAX:  (301)953-6904 

EMAIL:  BIB@APLCOMM.JHUAPL.EDU 


LUKE  CAMPBELL 
M/S:  BLOG  2035  SY30 
NAWC-AO/PAX 

PAX  RIVER  MO  20670- 
PHONE:  (301)826-7601 
FAX:  (301)826-7607 

EMAIL:  LCAMPBEL@TECNET1  .JCTE.JCS.MIL 


THOMAS  C.  CHOINSKI 
M/S:  CODE  2151 
NUWCDET 
BUILDING  80 
NEW  LONDON  CT  06320- 
PHONE:  (203)440-5391 
FAX:  (203)440-5243 
EMAIL: 


MICHAEL  EDWARDS 
M/S:  CODE  B40 
NSWCOD 

10901  NEW  HAMPSHIRE  AVE 
SILVER  SPRING  MD  20903-5640 
PHONE:  (301)394-4187 
FAX:  (301)394-1175 

EMAIL:  MEOWARO@NSWC-WO.NAVY.MIL 
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ED  ANOERT 
M/S: 

CONCEPTUAL  SOFTWARE  SYSTEMS 
P  0  BOX  727 

YORBA  UNOA  CA  92686-0727 
PHONE:  (714)996-2935 
FAX:  (714)572-1950 
EMAIL:  ANDERT@ORION.OAC.UCI.EDU 


SHERRY  BARKER 
M/S:  CODE  BIO 
NSWCDO 

DAHLGREN  VA  22448-5000 
PHONE:  (703)663-7378 
FAX: 

EMAIL: 


DAVID  BRITTON 
M/S; 

TRIDENT  SYSTEMS  INC 
10201  LEE  HWY  STE  300 
FAIRFAX  VA  22030- 
PHONE:  (703)691-7792 
FAX:  (703)273-6608 
EMAIL; 


JOSEPH  CHIARA 
M/S:  MTE 

AF  SPACE  AND  MISSILES  CEN 
LAAFB 

LOS  ANGELES  CA  90009- 
PHONE:  (310)363-3521 
FAX;  (310)363-0265 
EMAIL:  CHIARA@MT2.LAAFB.AF.MIL 


DANIEL  DAYTON 
M/S: 

JRS  RESEARCH  LABS 
1036  W  TAFT  AVE 
ORANGE  CA  92665-4121 
PHONE:  (714)974-2201 
FAX:  (714)974-2540 
EMAIL:  OAN@JRS.COM 


WILLIAM  EVANCO 
M/S:  M/S  2385 
MITRE  CORP 
7525  COLSHIRE  DR 
MCLEAN  VA  22102-3481 
PHONE:  (703)883-6102 
FAX:  (703)883-5787 
EMAIL:  EVANCO@MITRE.ORG 


RICHARD  EVANS 

M/S:  ST  II  RM133 

GEORGE  MASON  UNIVERSITY 

4400  UNIVERSITY  OR 

FAIRFAX  VA  22030-4444 

PHONE:  (703)993-3724 

FAX:  (703)993-1821 

EMAU^  REVANS@GMUVAX2.GMU.E0U 


ARMEN  GABRiEUAN 
IM/S: 

UNIVIEW  SYSTEMS 
1192  ELENA  PRIVAOA 
MOUNTAIN  VIEW  CA  94040- 
PHONE:  (415)968-3478 
FAX:  (415)968-3476 
EMAIL:  ARMEN@WELL.SF.CA.US 


ROBERT  GOETTGE 
M/S: 

AST  INC 

12200  E.  BRIARWOOO  AVE.  STE  260 
ENGLEWOOD  CO  80112- 
PHONE:  (303)790-4242 
FAX:  (303)790-2816 
EMAIL: 


STEVE  HARRISON 
M/S:  CODE  2153 
NUWCOET 

NEW  LONDON  CT  06320- 

PHONE:  (203)440-6153 

FAX:  (203)440-5987 

EMAIL:  HARRISONSJ@NUSC.NAVY.MIL 


JULIAN  HOLTZMAN 

M/S;  CE/CASE 

UNIVERSITY  OF  KANSAS 

2291  IRVING  HILL  RO  NICHOLS  HALL 

LAWRENCE  KS  66045- 

PHONE:  (913)864-7759 

FAX:  (913)864-7789 

EMAIL:  HOLTZMAN@KUHAB.CC.UKANS.EDU 


PHILIP  HWANG 
M/S:  CODE  A-10 
DMA  HOmS) 

8613  LEE  HWY 

FAIRFAX  VA  22031-2137 

PHONE:  (703)285-9236 

FAX:  (703)285-9396 

EMAIL:  POHWANG@CS.UMO.EDU 


WILLIAM  FARR 
M/S:  CODE  BIO 
NSWCOD 

DAHLGREN  VA  22448-5000 
PHONE:  (703)663-8388 
FAX:  (703)663-4568 
EMAIL:  WFARR@S850.MWC.E0U 


DENNIS  GARROOO 
M/S: 

ALLIANT  TECHSYSTEMS  INC 
6500  HARBOUR  HEIGHTS  PKWY 
EVERETT  WA  98275- 
PHONE:  (206)356-3293 
FAX:  (206)356-3185 
EMAIL: 


JEFFERY  GRADY 

M/S:  SPACE  SYSTEMS  DIVISION 

GENERAL  DYNAMICS 

6015CHARAE  ST 

SAN  DIEGO  CA  92122- 

PHONE:  (619)547-7108 

FAX:  (619)974-4000 

EMAIL: 


ROGER  HILLSON 
M/S:  CODE  5583 

NAVAL  RESEARCH  LABORATORY 
4555  OVERLOOK  AVE  SW 
WASHINGTON  OC  20375-5337 
PHONE:  (202)404-7332 
FAX:  (202)767-1122 
EMAIL:  HILLSON@Arr.NRL.NAVY.MIL 


STEVEN  HOWELL 
M/S:  CODE  B40 
NSWCOD 

10901  NEW  HAMPSHIRE  AVE. 

SILVER  SPRING  MD  20903-5640 
PHONE;  (301)394-3987 
FAX:  (301)394-1175 
EMAIL;  SHOWELL@NSWC-WO.NAVY.MIL 


FARNAM  JAHANIAN 
M/S:  MS  H2-B22 
IBM  RESEARCH 
P.O.  BOX  704 

YORKTOWN  HEIGHTS  NY  10598- 
PHONE;  (914)784-7498 
FAX;  (914)784-7455 
EMAIL;  FARNAM@WATSON.IBM.COM 


JAMES  FRANCIS 
M/S: 

STRATEGIC  INSIGHT  LTD 
201 1  CRYSTAL  OR  STE  101 
ARUNGTON  VA  22202- 
PHONE:  (703)553-9700 
FAX:  (703)553-9665 
EMAIL: 


JOSEPH  GERSTNER 
M/S: 

XRF 

8370  GREENSBORO  DR.  #919 
MCLEAN  VA  22102- 
PHONE:  (703)442-9020 
FAX:  (703)442-9020 
EMAIL:  JGERST@CS.GMU.EDU 


ROBERT  HALLIGAN 
M/S: 

TECHNOLOGY  AUSTRALASIA  PTY  LTD 
1010  DONCASTER  RD 
DONCASTER  E  VIC  AUSTRALIA  3109 
PHONE:  61.3.841.9733 
FAX:  61 .3.841 .8374 
EMAIL: 


NGOCDUNG  HOANG 
M/S;  CODE  B40 
NSWCDD 

10901  NEW  HAMPSHIRE  AVE 
SILVER  SPRING  MD  20903-5640 
PHONE:  (301)394-4877 
FAX:  (301)394-1 175 
EMAIL;  NHOANG@NSWC-WO.NAVY.MIL 


MICHELLE  HUGUE 

M/S:  AEROSPACE  TECHNOLOGY  CENT 

ALLIED-SIGNAL  AEROSPACE  CO 

9140  OLD  ANNAPOLIS  RO 

COLUMBIA  MD  21045- 

PHONE:  (410)964-4158 

FAX:  (410)992-5813 

EMAIL:  MMH@BATC.ALLIED.COM 


RALPH  JEFFORDS 

M/S:  CODE  5546 

NAVAL  RESEARCH  LABORATORY 

4555  OVERLOOK  AVE  SW 

WASHINGTON  DC  20375-5000 

PHONE:  (202)404-8493 

FAX:  (202)404-7942 

EMAIL:  JEFFORDS@ITD.NRL.NAVY.MIL 
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DAVID  JENNINGS 
M/S;  CODE  KSI 
NSWCDD 

DAHLGREN  VA  22448- 
PHONE;  (7031663-81 S7 
FAX:  (7031663-4568 

EMAIL:  0JENNIN9RELAY.NSWC.NAVY.MIL 


JEE-INKIM 

M/S: 

COMPUTER  CONRMANO  AND  CONTROL 
2300  CHESTNUT  ST  STE  230 
PHILADELPHIA  PA  19103- 
PHONE:  (215)854-0555 
FAX:  (215)854-0665 
EMAIL:  KIM9CCCC.COM 


NAOUFEL  KRAIEM 
M/S:  MASI/CRI 
UNIVERSITY  OF  PARIS  I 
17  RUE  DE  TOLBIAC 
PARIS  FRANCE  75013- 
PHONE:  44.24.93.65 
FAX:  45.86.76.66 
EMAIL:  NA0UFEL9MASI.IBP.fr 


SILE 

M/S:  COt'F  N51 
NSWCO.« 

10901  ;ilEW  HAMPSHIRE  AVE 
SILVER  SPRING  MO  20903-5640 
PHONE:  (301)394-1971 
FAX:  (301)394-4130 
EMAIL:  SLE9NSWC-W0.NAVY.MIL 


EVAN  LOCK 

M/S; 

COMPUTER  COMMAND  AND  CONTROL 
2300  CHESTNUT  ST  STE  230 
PHILADELPHIA  PA  19103- 
PHONE:  (215)854-0555 
FAX:  (215)854-0665 
EMAIL:  LOCK9CCCC.COM 


CATHERINE  MEADOWS 
M/S:  CODE  5543 

NAVAL  RESEARCH  LABORATORY 
4555  OVERLOOK  AVE  SW 
WASHINGTON  DC  DC  20375- 
PHONE:  (202)767-3490 
FAX;  (202)404-7942 
EMAIL:  MEA00WS9rr0.NRL.NAVY.MIL 


ALLEN  JOHNSON 
M/S: 

RAINBOW  SYSTEMS  ANALYSIS  GRP 

8920  BUSINESS  PARK  DR 

AUSTIN  TX  78759- 

PHONE:  (512)346-7999 

FAX:  (512)794-9997 

EMAIL: 


KANE  KIM 

M/S:  DEPARTMENT  OF  t/CE 
UNIVERSITY  OF  CALIFORNIA 

IRVINE  CA  92717- 

PHONE:  (714)856-5542 

FAX:  (714)856-4076 

EMAIL:  KANE9BALB0A.ENG.UCI.EDU  OR 

KANE9ICS.UCI 


BRUCE  LABAW 

M/S:  CODE  5546 

NAVAL  RESEARCH  LABORATORY 

4555  OVERLOOK  AVE  SW 

WASHINGTON  DC  20375- 

PHONE:  (202)767-3249 

FAX:  (202)404-7942 

EMAIL:  LABAW9IT0.NRL.NAVY.MIL 


KWEI-JAY  LIN 

M/S:  DEPARTMENT  OF  ECE 

UNIVERSITY  OF  CALIFORNIA 

IRVINE  CA  92717- 
PHONE:  (714)856-7839 
FAX:  (714)725-3203 
EMAIL:  KLIN9UCI.E0U 


TERESA  LUNT 
M/S:  EL245 
SRI  INTERNATIONAL 
333  RAVENSWOOO  AVE 
MENLO  PARK  CA  94025- 
PHONE;  (415)859-6106 
FAX:  (415)859-2844 
EMAIL;  LUNT9CSI.SRI.COM 


JEFFREY  MILLER 
M/S: 

SOHAR  INC 

133  ROLLINS  AVE  STE  5B 
ROCKVILLE  MO  20852- 
PHONE:  (301)230-5654 
FAX:  (703)734-6119 
EMAIL: 


NICHOLAS  KARANGELEN 
M/S: 

TRIDENT  SYSTEMS  INCORPORATED 
10201  LEE  HWY  SUITE  300 
FAIRFAX  VA  22030- 
PHONE:  (703)273-1012 
FAX:  (703)273-6608 

EMAIL:  NKARANG9NSWC-W0.NAVY.MIL 


GARY  KOOB 

M/S:  CODE  3332 

OFFICE  OF  CNR 

800  N  QUNICY  ST 

ARUNGTON  VA  22217-5660 

PHONE:  (703)696-0872 

FAX: 

EMAIL: 


CARL  LANDWEHR 
M/S:  CODE  5542 

NAVAL  RESEARCH  LABORATORY 

4555  OVERLOOK  AVE  SW 

WASHINGTON  DC  20375- 

PHONE:  (202)767-3381 

FAX:  (202)404-7942 

EMAIL:  LANDWEHR9IT0.NRL.NAVY.MIL 


JANE  W.S.  UU 
M/S:  DEPARTMENT  OF  CS 
UNIVERSITY  OF  ILLINOIS 
1304  W.  SPRINGFIELD  AVENUE 
URBANA  IL  61801- 
PHONE:  (217)333-0135 
FAX:  (217)333-3501 
EMAIL:  JANELIU9CS.UIUC.EDU 


W.L.  MCCOY 
M/S:  CODE  BIO 
NSWCDD 

DAHLGREN  VA  22448-5000 
PHONE:  (703)663-8367 
FAX:  (703)663-4568 
EMAIL: 


DANIEL  MOSTERT 
M/S:  DEPARTMENT  FOR  CS 
RAND  AFRIKAANS  UNIVERSITY 
P.  0.  BOX  524 

JOHANNESBURG  S  AFRIC  2000- 
PHONE:  27.11.4892847 
FAX:  27.11.4892138 
EMAIL:  BASIE9RKW.RAU.AC.ZA 
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GILBERT  MYERS 
M/S:  CODE  41 
NRAO 

271  CATALINA  BLVD 
SAN  OIEGO  CA  92152-5000 
PHONE:  (619)553-4136 
FAX:  (619)553- 
EMAIL:  GMYERSeNOSC.MIL 


OAVIO  OUVER 
M/S: 

GE  CORPORATE  R&O  CTR 
P.O.  BOX  8 

SCHENECTADY  NY  12301- 
PHONE:  (518)387-6458 
FAX:  (518)387-6104 
EMAIL:  OUVEROW@CRO.GE.COM 


MOHSEN  PAZIRANDEH 
M/S: 

INNOVATIVE  RESEARCH  INC. 

180  COOK  ST  STE  315 
DENVER  CO  80206- 
PHONE:  (303)321-4955 
FAX: 

EMAIL:  MOHSEN@CS.COLORADO.EOU 


BALA  RAMESH 

M/S:  CODE  AS/RA 

NAVAL  POSTGRADUATE  SCHOOL 

MONTEREY  CA  93943- 
PHONE:  (408)656-2439 
FAX:  (408)656-3407 
EMAIL:  RAMESH@NPS.NAVY.MIL 


ANDRES  RUOMIK 
M/S: 

SPS  INC 
122N4THAVE 
INOIALANTIC  FL  32903- 
PHONE:  (407)984-3370 
FAX:  (407)728-3957 
EMAIL:  AXR@SPS.COM 


RAJIV  SAIN 
M/S: 

AUTOMATED  SCIENCES  GRP 
16349  0AHLGREN  RD 
DAHLGREN  VA  22448- 
PHONE:  (703)663-9231 
FAX:  (703)663-3717 
EMAIL: 


JOHN  NAUON 

M/S:  MS  8404 

TEXAS  INSTRUMENTS 

6500  CHASE  OAKS  BLVD 

PLANO  TX  75023- 

PHONE:  (214)575-3450 

FAX:  (214)575-5847 

EMAIL:  NAaON@DOO.ITG.TI.COM 


DANIEL  ORGAN 

M/S:  CODE  2151 

NUWCDET 

BUILDING  80 

NEW  LONDON  CT  06320- 

PHONE:  (203)440-6546 

FAX:  (203)440-5243 

EMAIL; 


OAR-TZEN  PENG 
M/S: 

ALLIED-SIGNAL  MTC 
9140  OLD  ANNAPOLIS  RO 
COLUMBIA  MO  21045-1998 
PHONE:  (301)964-4195 
FAX:  (301)992-5813 
EMAIL:  DTP@BATC.ALLIED.COM 


JOHN  REILLY 
M/S:  CODE  KS4 
NSWCOO 

DAHLGREN  VA  22448- 
PHONE:  (703)663-7257 
FAX:  (703)663-4568 
EMAIL; 


JOHN  RUMBUT 
M/S:  CODE  2222 
NUWC 

BLDG.  1171-3 
NEWPORT  Rl  02841-2047 
PHONE:  (401)841-3616 
FAX:  (401)841- 

EMAIL:  RUMBUT@ADA.NPT.NUWC.NAVY.MIL 


SAUMYA  SANYAL 
M/S:  Ml  55 
FMC 

4800  E  RIVER  RD 
MINNEAPOLIS  MN  55421- 
PHONE:  (612)572-7577 
FAX:  (612)572-4991 
EMAIL;  SANYALSK@NSD.FMC.COM 


SWAMINATHAN  NATARAJAN 
M/S:  DEPARTMENT  OF  CS 
TEXAS  A&M  UNIVERSITY 

COLLEGE  STATION  TX  77843-3112 
PHONE:  (409)845-8287 
FAX:  (409)847-8578 
EMAIL:  SWAMI@CS.TAMU.EDU 


DAVID  OWENS 
M/S: 

PARAMAX  SYSTEMS  CANADA 
61 1 1  AVENUE  ROYALMOUNT 
MONTREAL  QE  H4P  1K6 
PHONE:  (514)340-7031 
FAX:  (514)340-8318 
EMAIL: 


PARAMESWARAN  RAMANATHAN 
M/S:  DEPARTMENT  OF  E/CE 
UNIVERSITY  OF  WISCONSIN 
1415  JOHNSON  DR 
MADISON  Wl  53706-1691 
PHONE:  (608)263-0557 
FAX:  (608)262-1267 
EMAIL;  PARMESH@ECE.WISC.EDU 


CHARLES  ROBERTSON 

M/S;  ENGINEERING  SUPPORT 

AUTOMATED  SCIENCES  GROUP  INC. 

P.O.  BOX  1750 

DAHLGREN  VA  22448- 

PHONE:  (703)663-5231 

FAX:  (703)663-3717 

EMAIL; 


CHARLES  SADEK 
M/S:  CODE  B40 
NSWCDD 

10901  NEW  HAMPSHIRE  AVE 
SILVER  SPRING  MD  20903-5640 
PHONE:  (301)394-5187 
FAX: 

EMAIL; 


RICHARD  SCALZO 
M/S:  CODE  A10 
NSWCDD 

10901  NEW  HAMPSHIRE  AVE 
SILVER  SPRING  MO  20903-5640 
PHONE;  (301)394-2926 
FAX;  (301)394-1164 
EMAIL:  RSCALZO@NSWC-WO.NAVY.MIL 
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CARL  SCHMIEDEKAMH> 

M/S:  CODE  7033 
NAWC-AO 
r.O.  BOX  5152 

WARMINSTER  PA  189740531 
PHONE:  (215)441-1779 
FAX:  (215)441-3225 
EMAIL:  CARLS9NA0C.NAVY.MIL 


JAMES  SMITH 
M/S: 

OFFICE  OF  THE  CNR 

800  N  QUNICY  ST 

ARLINGTON  VA  22217-5000 

PHONE:  (703)696-5752 

FAX:  (703)696-1330 

EMAIL:  J6SMITH9lTD.NRL.NAVY.MIL 


JAY  STROSNIOER 
M/S:  DEPARTMENT  OF  E/CE 
CARNEGIE  MELLON  UNIVERSITY 
5000  FORBES  AVE 
PITTSBURGH  PA  15213- 
PHONE:  (412)268-6927 
FAX:  (412)268-3890 
EMAIL:  JKS9USA.ECE.CMU.E0U 


BEVERLEY  TANKSLEY 
M/S: 

MYSTECH  ASSOCIATES  INC. 
5205  LEESBURG  PIKE  STE  1200 
FALLS  CHURCH  VA  22042- 
PHONE:  (703)671-8680 
FAX:  (703)671-8932 
EMAIL: 


JAMES  WILLIAMSON 
M/S:  CODE  703A 
NAWC 

WARMINSTER  PA  18974- 
PHONE:  (215)441-1564 
FAX:  (215)441-3225 
EMAIU  JAW9NA0C.NAVY.MIL 


NORMAN  SCHNEIDEWINO 

M/S:  CODE  AS/SS 

NAVAL  POSTGRADUATE  SCHOOL 

MONTEREY  CA  93943- 

PHONE:  (408)656-2719 

FAX:  (408)656-3407 

EMAIL:  SCHNEIDEWIND9NPS.NAVY.MIL 


SANG  SON 

M/S:  DEPARTMENT  OF  CS 
UNIVERSITY  OF  VIRGINIA 
THORNTON  HALL 
CHARLOTTESVILLE  VA  22903- 
PHONE:  (804)982-2205 
FAX:  (804)982-2214 
EMAIL:  S0N9VIRGINIA.E0U 


HAROLD  SZU 
M/S:  CODE  R44 
NSWCOO 

10901  NEW  HAMPSHIRE  AVE 
SILVER  SPRING  MD  20903-5640 
PHONE:  (301)394-3097 
FAX:  (301)394-3923 

EMAIL:  HSZU9ULYSSES.NSWC.NAVY.MIL 


LONNIE  WELCH 

M/S:  DEPARTMENT  OF  C/IS 

NEW  JERSEY  INSTITUTE  OF  TECH 

UNIVERSITY  HTS 

NEWARK  NJ  07102- 

PHONE:  (201)596-5683 

FAX;  (201)596-5777 

EMAIL;  WELCH9VIENNA.NJIT.EDU 


MARK  WILSON 
M/S:  CODE  B40 
NSWCOO 

10901  NEW  HAMPSHIRE  AVE 
SILVER  SPRING  MO  20903-5640 
PHONE:  (301)394-5099 
FAX;  (301)394-1175 
EMAIL:  MLWILS09NSWC-W0.NAVY.MIL 
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KANG  SHIN 

M/S:  DEPARTMENT  OF  EE/CS 
UNIVERSITY  OF  MICHIGAN 

ANN  ARBOR  Ml  48109-2122 
PHONE:  (313)763-0391 
FAX:  (313)763-4617 
EMAIL:  KGSHIN9EECS.UMICH.EDU 


ALEXANDER  STOYENKO 

M/S:  DEPARTMENT  OF  C/IS 

NEW  JERSEY  INSTITUTE  OF  TECH 

UNIVERSITY  HTS 

NEWARK  NJ  07102- 

PHONE:  (201)596-5765 

FAX:  (201)596-5777 

EMAIL:  ALEX9VULCAN.NJIT.E0U 


MARY  TAMUCCI 
M/S: 

MYSTECH  ASSOCIATES  INC. 

5205  LEESBURG  PIKE  STE  1200 
FALLS  CHURCH  VA  22042- 
PHONE:  (703)671-8680 
FAX:  (703)671-8932 
EMAIL:  TAMUCCI9NUSC.NAVY.MIL 


STEPHANIE  WHITE 
M/S:  MS  A08-35 
GRUMMAN  CRC 

BETHPAGE  NY  11714-3580 
PHONE:  (516)575-2201 
FAX:  (516)575-7716 

EMAIL:  STEPH9GDSTECH.GRUMMAN.COM 


DISTRIBUTION 


Copies 

DOD  ACTIVITIES  (CONUS) 

ATTN  CODE  A-10 

(PHILLIP  HWANG)  10 

DEFENSE  MAPPING  AGENCY 
8613  LEE  HIGHWAY 
FAIRFAX  VA  22031-2137 

DEFENSE  TECHNICAL 
INFORMATION  CENTER 
CAMERON  STATION 
ALEXANDRIA  VA  22304-6145  12 

ATTN  CODE  5543 

(KATHERINE  MEADOWS)  1 

NAVAL  RESEARCH  LABORATORY 
4555  OVERLOOK  AVE  SW 
WASHINGTON  DC  20375 

ATTN  CODE  4411B 

(ELIZABETH  WALD)  1 

CODE  44 lie 

(GRACIE  THOMPSON)  1 

OFFICE  OF  NAVAL  RESEARCH 
800  NORTH  QUINCY  STREET 
ARLINGTON  VA  22217-5000 

ATTN  CODE  E29L  4 

COASTAL  SYSTEMS  STATION 
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